[clug] raid question

Eyal Lebedinsky eyal at eyal.emu.id.au
Wed Feb 19 19:08:58 MST 2014


In short: smartctl lists one pending sector. A dd provokes an i/o error as expected.
An mdadm 'check' does not find a problem and does not trigger an i/o error. Why?


My smart log is indicating a pending sector in a component of a 7x4TB raid6 device.
Looking at the component I see:

# smartctl -x /dev/sdi
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      5878         261696

I then test it:

# dd if=/dev/sdi of=/dev/null skip=261120 count=2048
dd: error reading '/dev/sdi': Input/output error
576+0 records in
576+0 records out
294912 bytes (295 kB) copied, 3.18338 s, 92.6 kB/s

and the log shows:

# dmesg|tail
[768141.382189]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[768141.461997]         00 03 fe 40
[768141.503122] sd 6:0:6:0: [sdi]
[768141.542668] Add. Sense: Unrecovered read error - auto reallocate failed
[768141.623913] sd 6:0:6:0: [sdi] CDB:
[768141.667622] Read(16): 88 00 00 00 00 00 00 03 fe 40 00 00 00 08 00 00
[768141.748586] end_request: I/O error, dev sdi, sector 261696
[768141.816217] Buffer I/O error on device sdi, logical block 32712
[768141.889061] ata13: EH complete
[768141.927696] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1

I decided to run a raid check on this part (the first 1GB is enough to cover this
bad sector) and it did not find any problem and did not trigger an i/o error.

This last fact I find unexpected as I thought the mdadm 'check' operation will read
all 7 parts in each stripe and validate the checksums.

Q1) Why do I not see an i/o error from the raid check?

I want to use debugfs to see where the problem is and fix it. I need to know which
fs blocks include this sector (actually the whole stripe needs to be attended to).
If any is in use I will try to recover the files. I will then run a raid 'repair'
on the area.

sdi sector 261696 is sdi1 sector 259648 and in a 7 part raid6 it is 259648*5=1298240
sectors into the fs, or 162280 4k blocks. The blocks in this area are my focus.

Q2) Is this logic correct?

TIA

-- 
Eyal Lebedinsky (eyal at eyal.emu.id.au)


More information about the linux mailing list