[clug] raid question

Eyal Lebedinsky eyal at eyal.emu.id.au
Wed Feb 19 19:08:58 MST 2014

In short: smartctl lists one pending sector. A dd provokes an i/o error as expected.
An mdadm 'check' does not find a problem and does not trigger an i/o error. Why?

My smart log is indicating a pending sector in a component of a 7x4TB raid6 device.
Looking at the component I see:

# smartctl -x /dev/sdi
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      5878         261696

I then test it:

# dd if=/dev/sdi of=/dev/null skip=261120 count=2048
dd: error reading '/dev/sdi': Input/output error
576+0 records in
576+0 records out
294912 bytes (295 kB) copied, 3.18338 s, 92.6 kB/s

and the log shows:

# dmesg|tail
[768141.382189]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[768141.461997]         00 03 fe 40
[768141.503122] sd 6:0:6:0: [sdi]
[768141.542668] Add. Sense: Unrecovered read error - auto reallocate failed
[768141.623913] sd 6:0:6:0: [sdi] CDB:
[768141.667622] Read(16): 88 00 00 00 00 00 00 03 fe 40 00 00 00 08 00 00
[768141.748586] end_request: I/O error, dev sdi, sector 261696
[768141.816217] Buffer I/O error on device sdi, logical block 32712
[768141.889061] ata13: EH complete
[768141.927696] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1

I decided to run a raid check on this part (the first 1GB is enough to cover this
bad sector) and it did not find any problem and did not trigger an i/o error.

This last fact I find unexpected as I thought the mdadm 'check' operation will read
all 7 parts in each stripe and validate the checksums.

Q1) Why do I not see an i/o error from the raid check?

I want to use debugfs to see where the problem is and fix it. I need to know which
fs blocks include this sector (actually the whole stripe needs to be attended to).
If any is in use I will try to recover the files. I will then run a raid 'repair'
on the area.

sdi sector 261696 is sdi1 sector 259648 and in a 7 part raid6 it is 259648*5=1298240
sectors into the fs, or 162280 4k blocks. The blocks in this area are my focus.

Q2) Is this logic correct?


Eyal Lebedinsky (eyal at eyal.emu.id.au)

More information about the linux mailing list