[Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba?

Matthias Merz samba-list at j01.merz-ka.de
Fri Jun 1 21:44:29 GMT 2007


Hi folks,

Today I noticed some strange behaviour when accessing a samba server
(samba 3.0.25a) from windows: On our Debian fileserver I prepared a
file testfile.txt being owned by user usera and group dpt-a. Then I
"setfacl -m g:admins:rwx testfile.txt". User userb who is only in
group admins, but not in dpt-a is thus permitted to access and change
this file by its POSIX-ACL, which works flawlessly from linux.

$ getfacl testfile.txt
# file: testfile.txt
# owner: usera
# group: dpt-a
user::rwx
group::r--
group:admins:rwx
mask::rwx
other::r--


Then I did some changes to that file from a windows machine via
notepad.exe and noticed, that notepad seemed to "succeed" in saving,
but the changes were *not* written to that file! Very strange IMHO.


So I did some more digging with strace, since I didn't find a clue in
the logs.

"strace -e open,close,write -f smbd -D" yielded:
[pid 17704] open("foo/testfile.txt", O_RDWR|O_CREAT|O_NOFOLLOW, 0744) = 29
  [some write()s to FD 24]
[pid 17704] open("foo/testfile.txt", O_WRONLY|O_NOFOLLOW) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17704] --- SIGIO (I/O possible) @ 0 (0) ---
[pid 17704] +++ killed by SIGIO +++
[pid 17478] --- SIGCHLD (Child exited) @ 0 (0) ---

So this seemed to "explain" notepad thinking the file was saved
successfully when I assume the SMB-protocol to not do "hard checks"
for successful writes. Since the child serving my windows-access was
killed, no error-message was probably be sent out.

When googling for SIGIO and samba, I noticed some google-hits talking
about oplocks, so I just tried disabling kernel oplocks in smb.conf:
"kernel oplocks = no". This did the trick, after restarting samba, the
writes were successsful again.


Since the manpage states I would want oplocks (and I do *g*), I
enabled them again and tried debugging using gdb (to provide the
samba-team with a more detailed report). As I don't really know gdb, I
failed in the first attempt because of samba forking multiple
processes which were not "caught" by my gdb call (but the error
occurred). So as weekend was approaching, I did'nt dig further into
gdb, but read the manpage for smbd and started "gdb /usr/sbin/smbd -F
-i". When trying to reproduce the error, I failed. I could reproduce
this change even without gdb: "smbd -F -i -d 5" started from the shell
did the writes, whereas "normal" smbd (smbd -F) failed to write the
changes.


One wild guess: maybe oplocks can only be done by the file owner /
group owner and the samba-process crashes because of such a thing? Is
there a difference in privilege-handling between "smbd -F" and "smbd
-F -i" that could explain this?

I'd assume this to be a samba bug, because I could reproduce this both
with a not-so-recent linux-2.6 i386 and with a more recent linux-2.6
amd64.

I can provide more debugging output etc. at the earliest on monday;
sorry I forgot taking a log of a "full" strace-call as well as writing
down the exact kernel versions which would of course have been very
useful for you.


Thanks for your replies and any help in solving this issue,
Yours
Matthias Merz

-- 
Beware of bugs in the above code; I have only proved it
correct, not tried it.                (Donald E. Knuth)


More information about the samba mailing list