[Samba] Possible Filesystem Corruption with Samba 3.0.25a (with XFS and LVM)

Andri aoeuid at gmail.com
Tue Jun 26 12:35:09 GMT 2007


Hello!

A few days ago I received a filesystem memory corruption notification
from Debian's Linux kernel (2.6.20), which automatically unmounted my
root partition. Upon closer investigation, I found that something had
overwritten most of my data, XFS's superblocks and other metadata
structures. That means from partition offset 0x200. At the time of the
error, the only services I was using were Samba with Unix Extensions
enabled, LVM2 which managed the mountpoint where I was writing to
through Samba, and XFS, which managed both my / and the LVM's
partition's files. / was a single partition on one disk, /storage was
the LVM managed partition made up of multiple disks.

I noticed the corruption issue on the server around the time my
Bittorrent client Deluge Torrent
(http://download.deluge-torrent.org/stable/deluge-0.5.1.1.tar.gz) was
allocating space for a download. The client machine was Gentoo
suspend2-sources 2.6.21-r6.

I'm not saying this is Samba's bug for sure, but I am trying to find
out what's responsible. I've had long chats with people involved with
XFS in the #xfs chatroom on Freenode, and they've stated that XFS has
checks that prevent itself from writing to block 0, the same block
that now holds some unknown structure of data and a file path of the
file my torrent client seemed to be allocating. As I'm not a Linux
developer, I lack the experience to go digging the source code.

I did not take note of the kernel messages that were displayed before
I rebooted the machine, because I had no expectation of such a
disaster, and hoped a reboot would fix everything.
All I have now to help find the cause of this problem is the trashed filesystem.
The memory and disk itself were tested and are healthy. Clearly a
software error.

Example output from offset 0x200 on the root disk:
00000260  00 00 00 00 00 00 00 00  d4 3e 00 00 00 00 01 00  |.........>......|
00000270  9f 01 12 00 07 00 00 00  40 00 00 00 99 41 7c 46  |........ at ....A|F|
00000280  71 7a 09 00 00 fd 00 00  00 00 00 00 24 08 00 00  |qz..........$...|
00000290  00 00 00 00 86 01 00 00  f1 03 00 00 00 00 00 00  |................|
000002a0  2f 73 74 6f 72 61 67 65  00 53 6f 66 74 77 61 72  |/storage.Softwar|
000002b0  65 2f 57 69 6e 64 6f 77  73 2f 47 61 6d 65 73 2f  |e/Windows/Games/|
000002c0  54 69 74 61 6e 20 51 75  65 73 74 20 2d 2d 20 49  |Titan Quest -- I|
000002d0  6d 6d 6f 72 74 61 6c 20  54 68 72 6f 6e 65 2f 54  |mmortal Throne/T|
000002e0  69 74 61 6e 2e 51 75 65  73 74 2e 49 6d 6d 6f 72  |itan.Quest.Immor|
000002f0  74 61 6c 2e 54 68 72 6f  6e 65 2d 55 6e 6c 65 61  |tal.Throne-Unlea|
00000300  73 68 65 64 2f 75 6e 6c  2d 74 71 69 74 2e 70 61  |shed/unl-tqit.pa|
00000310  72 74 31 35 2e 72 61 72  00 42 42 42 18 01 00 00  |rt15.rar.BBB....|
00000320  00 00 00 00 00 01 00 00  10 00 00 00 e9 00 00 00  |................|
00000330  69 8a 82 e8 ad de e1 fe  00 fd 00 00 00 00 00 00  |i...............|
00000340  25 08 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |%...............|
00000350  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The rest of the data on this corrupted filesystem is filled with
similar data blocks -- unknown metadata around a filepath referring to
/storage. Just as if something had done filelistings of /storage, and
output its memory structures onto the raw device. If some Samba
developers recognize this structure (perhaps it's something that's
supposed to be in-memory before sending via SMB), please let me know.
The filesystem on /storage (LVM managed), did not seem to be
corrupted, and at least showed its contents when I did a quick check
with a LiveCD.

I understand that Samba is supposed to drop its privileges after a
connection, but I assume it has to run some parts as root, especially
because I set the dos style file permission changing option (to allow
groups to change perms, not only owners) also on. The feature didn't
work though, but the option was set in smb.conf.

This is a major issue, but due to the lack of helpful info, I'm forced
to ask in various places.
Perhaps Deluge Torrent's allocation routines got Samba confused?

There aren't many suspects -- either Samba, XFS (which probably is
more common than Samba, so less likely) or the rest of the kernel
(which, again, is unlikely). LVM is so low level and less complex than
all others, so chances of it messing up like this are microscopic.
Syslog-and-friends don't even care about files, and Exim does not run
as root after starting up.
The peculiar thing is, that the info that was written on top of
/dev/hdb3 contains the filepaths of /storage, so I'm betting it had
something to do with Samba, which at the time was actively dealing
with /storage. It was a conservative home machine, so I'm pretty
confident to rule out man-made timebombs.


Thank you in advance for any helpful replies!

Hopefully I/we can find the cause of this, because I'd take a dead
actuator any day over overwritten data -- easier to restore :)


More information about the samba mailing list