[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
Ray Van Dolson
rvandolson at esri.com
Fri Oct 23 15:18:19 MDT 2009
(Yes, I should upgrade Samba to 3.0.35).
We're running the Sun provided Samba daemon (SUNWsmbau and friends) on
Solaris 10 Generic_138888-08 (sparc).
Lots of Windows clients (mixed XP, 2003, 2008) hit this server and
periodically we'll start seeing smbd processes begin piling up. These
processes can't be killed with a normal kill -- only kill -9 will do
In the past I've been working with the owners of these Windows machines
to ensure scripts they use that hit our shares are written correctly.
However, I started peeking at a lot of these smbd proceses and it seems
like something is amiss perhaps on the Samba side.
Here's the pertinent info on a randomly selected "hung" process:
# truss -v all -aef -p 2506767
25067: *** SUID: ruid/euid/suid = 0 / 122 / 122 ***
25067: *** SGID: rgid/egid/sgid = 0 / 9 / 9 ***
25067: psargs: /usr/sfw/sbin/smbd -D
25067: fcntl(10, F_SETLKW64, 0xFFBFF6F8) (sleeping...)
25067: typ=F_WRLCK whence=SEEK_SET start=32412 len=1 sys=4245464 pid=0
What's FD 10 you ask?
# pfiles -F 25067
10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680
advisory read lock set by process 21130
At this point, cued by another post on this list, I tried a tdbdump on
/var/samba/locks/brlock.tdb. It completed without issue however.
# pstack -F 25067
25067: /usr/sfw/sbin/smbd -D
ff049c64 fcntl (a, 23, ffbff6f8)
ff0398c0 fcntl (a, 23, ffbff6f8, 7e9c, fee02a00, 18a564) + 18
002822e8 tdb_brlock (4c18e0, 7e9c, 2, 23, 0, 1) + 90
002825f0 tdb_lock (4c18e0, 1f7d, 2, 0, 20, 0) + 16c
0020982c ???????? (0, 6833f8, 1, 5cb1d0, 5cb1e0, 40c7d8)
00202d18 is_locked (6833f8, feff, 0, 40c7d8, 0, 0) + 280
00091820 reply_read_and_X (6ded80, 6be900, 3f, 6833f8, 20000, 7) + 2d4
000d35ec ???????? (6be900, 69e4b0, 6be900, 3f, 20000, 8e94)
000d3728 ???????? (9400, 6be900, 3f, 20000, 9400, 0)
000d399c ???????? (69e4b0, 6be900, 4134a0, 6cc8, 40c7d8, 6c00)
000d4b78 smbd_process (6800, 40c7d8, 93a80, 20441, d, 0) + 1ec
00338f38 main (0, 43e110, 0, 41566c, 4175d4, 1) + 9cc
0004e118 _start (0, 0, 0, 0, 0, 0) + 108
The truss shows me that the signals are being received, but in all
cases, the process goes back to the SETLKW64 call.
/var/samba/locks is on a normal UFS filesystem.
Now, clearly there are some patches that could be applied to this
system, and I can upgrade Samba to 3.0.35, but I'm hoping someone out
there will have an idea of what might be going on here. Why would this
particular smbd process *not* be able to get a lock on the brlock.tdb
file at a certain point, but subsequent smbd processes apparrently are
(new connections to the server appear to be working OK)? And why
wouldn't the SETLKW64 command eventually succeed?
Would like to get this one figured out instead of just manually killing
all the processes every couple weeks or so.
Thanks much :)
More information about the samba