[Samba] Possible Filesystem Corruption with Samba 3.0.25a
(with XFS and LVM)
Jerome Haltom
wasabi at larvalstage.net
Tue Jun 26 19:45:48 GMT 2007
XFS eats files. Did you lose power or did your system crash? XFS is very
good at losing files.
Other than that, LVM is the next culprit.
Samba only opens and writes to files. It has no code to do anything else
in it.
On Tue, 2007-06-26 at 15:35 +0300, Andri wrote:
> Hello!
>
> A few days ago I received a filesystem memory corruption notification
> from Debian's Linux kernel (2.6.20), which automatically unmounted my
> root partition. Upon closer investigation, I found that something had
> overwritten most of my data, XFS's superblocks and other metadata
> structures. That means from partition offset 0x200. At the time of the
> error, the only services I was using were Samba with Unix Extensions
> enabled, LVM2 which managed the mountpoint where I was writing to
> through Samba, and XFS, which managed both my / and the LVM's
> partition's files. / was a single partition on one disk, /storage was
> the LVM managed partition made up of multiple disks.
>
> I noticed the corruption issue on the server around the time my
> Bittorrent client Deluge Torrent
> (http://download.deluge-torrent.org/stable/deluge-0.5.1.1.tar.gz) was
> allocating space for a download. The client machine was Gentoo
> suspend2-sources 2.6.21-r6.
>
> I'm not saying this is Samba's bug for sure, but I am trying to find
> out what's responsible. I've had long chats with people involved with
> XFS in the #xfs chatroom on Freenode, and they've stated that XFS has
> checks that prevent itself from writing to block 0, the same block
> that now holds some unknown structure of data and a file path of the
> file my torrent client seemed to be allocating. As I'm not a Linux
> developer, I lack the experience to go digging the source code.
>
> I did not take note of the kernel messages that were displayed before
> I rebooted the machine, because I had no expectation of such a
> disaster, and hoped a reboot would fix everything.
> All I have now to help find the cause of this problem is the trashed filesystem.
> The memory and disk itself were tested and are healthy. Clearly a
> software error.
>
> Example output from offset 0x200 on the root disk:
> 00000260 00 00 00 00 00 00 00 00 d4 3e 00 00 00 00 01 00 |.........>......|
> 00000270 9f 01 12 00 07 00 00 00 40 00 00 00 99 41 7c 46 |........ at ....A|F|
> 00000280 71 7a 09 00 00 fd 00 00 00 00 00 00 24 08 00 00 |qz..........$...|
> 00000290 00 00 00 00 86 01 00 00 f1 03 00 00 00 00 00 00 |................|
> 000002a0 2f 73 74 6f 72 61 67 65 00 53 6f 66 74 77 61 72 |/storage.Softwar|
> 000002b0 65 2f 57 69 6e 64 6f 77 73 2f 47 61 6d 65 73 2f |e/Windows/Games/|
> 000002c0 54 69 74 61 6e 20 51 75 65 73 74 20 2d 2d 20 49 |Titan Quest -- I|
> 000002d0 6d 6d 6f 72 74 61 6c 20 54 68 72 6f 6e 65 2f 54 |mmortal Throne/T|
> 000002e0 69 74 61 6e 2e 51 75 65 73 74 2e 49 6d 6d 6f 72 |itan.Quest.Immor|
> 000002f0 74 61 6c 2e 54 68 72 6f 6e 65 2d 55 6e 6c 65 61 |tal.Throne-Unlea|
> 00000300 73 68 65 64 2f 75 6e 6c 2d 74 71 69 74 2e 70 61 |shed/unl-tqit.pa|
> 00000310 72 74 31 35 2e 72 61 72 00 42 42 42 18 01 00 00 |rt15.rar.BBB....|
> 00000320 00 00 00 00 00 01 00 00 10 00 00 00 e9 00 00 00 |................|
> 00000330 69 8a 82 e8 ad de e1 fe 00 fd 00 00 00 00 00 00 |i...............|
> 00000340 25 08 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |%...............|
> 00000350 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
>
> The rest of the data on this corrupted filesystem is filled with
> similar data blocks -- unknown metadata around a filepath referring to
> /storage. Just as if something had done filelistings of /storage, and
> output its memory structures onto the raw device. If some Samba
> developers recognize this structure (perhaps it's something that's
> supposed to be in-memory before sending via SMB), please let me know.
> The filesystem on /storage (LVM managed), did not seem to be
> corrupted, and at least showed its contents when I did a quick check
> with a LiveCD.
>
> I understand that Samba is supposed to drop its privileges after a
> connection, but I assume it has to run some parts as root, especially
> because I set the dos style file permission changing option (to allow
> groups to change perms, not only owners) also on. The feature didn't
> work though, but the option was set in smb.conf.
>
> This is a major issue, but due to the lack of helpful info, I'm forced
> to ask in various places.
> Perhaps Deluge Torrent's allocation routines got Samba confused?
>
> There aren't many suspects -- either Samba, XFS (which probably is
> more common than Samba, so less likely) or the rest of the kernel
> (which, again, is unlikely). LVM is so low level and less complex than
> all others, so chances of it messing up like this are microscopic.
> Syslog-and-friends don't even care about files, and Exim does not run
> as root after starting up.
> The peculiar thing is, that the info that was written on top of
> /dev/hdb3 contains the filepaths of /storage, so I'm betting it had
> something to do with Samba, which at the time was actively dealing
> with /storage. It was a conservative home machine, so I'm pretty
> confident to rule out man-made timebombs.
>
>
> Thank you in advance for any helpful replies!
>
> Hopefully I/we can find the cause of this, because I'd take a dead
> actuator any day over overwritten data -- easier to restore :)
More information about the samba
mailing list