[Samba] Possible Filesystem Corruption with Samba 3.0.25a (with XFS and LVM)

Andrew Morgan morgan at orst.edu
Tue Jun 26 19:57:05 GMT 2007


On Tue, 26 Jun 2007, Andri wrote:

> Adam Tauno Williams wrote:
>> On Tue, 2007-06-26 at 12:00 -0400, Charles Marcus wrote:
>>> On 6/26/2007, Andri (aoeuid at gmail.com) wrote:
>>>> I've done occasional memtests for a few days straight, and all have
>>>> ended successfully. If it wasn't one of those one-in-a-quintillion
>>>> chances that the sun flipped the necessary bits in memory, I'm
>>>> betting on software bugs.
>>> Memtest is hardly a reliable test for memory. I have had bad memory pass
>>> test for days on end.
>>> The best way I've ever found to reliably find bad memory is compile
>>> something big, like X. If your memory is bad, you'll find out pretty
>>> quick...
>>
>> The real solution is to use ECC memory. :)
>>
>
> It's a headless server without X, but I've compiled plenty of other applications
> on it without issues. That includes Linux. The chance that a bit flipping on the
> exact location that directs Samba's (or the filesystem's or what-not's) output,
> and it ending up on another (and raw) device is something I really can't believe
> happening.
>
> Like the XFS guys said, memory corruption errors might not necessarily be
> because of faulty hardware.
>
> Even if this issue is related to the SATA controller's driver, I wish to find
> out the origin of the data structures I've pasted twice now, because I believe
> tracing them might hold the key to this mystery. Of course, I lack the expertise
> to scan a driver's source code for such possible mistakes, but at least I can
> let the author know and ask for their assistance.
>
> Blaming hardware for uncommon and unexpected behavior is not always the
> reasonable thing to do.

Samba uses standard system calls to create, modify, and delete files.  It 
does not write to random bits of /dev/hda.  If you have filesystem 
corruption, then the problem lies elsewhere.

Maybe the data you found came from Samba (indirectly through files your 
Bittorrent client was saving to a Samba share), but that does not imply 
that Samba was the cause of the problem.  When Samba used the system call 
write() - or whatever optimized system call it uses - some other piece of 
software (XFS, LVM, Linux kernel IDE driver) placed that data in the wrong 
place on the disk.

In my experience (which only counts as anecdotal evidence anyways), disk 
hardware failures are usually easily detected as ever-increasing bad block 
counts reported by the disk's S.M.A.R.T. firmware.  If the disk still 
works normally and is not reporting any SMART errors, then you can 
probably rule out hardware.

I'm not saying it is impossible for Samba to create this problem, but 
since Samba uses standard system calls and has no reason to write directly 
to the /dev/hda raw device, it seems far more likely that the software 
which does actually write to the raw device (XFS, LVM, Linux kernel) is 
the culprit.

 	Andy


More information about the samba mailing list