[Samba] hanging smbd(s) revisited

William Jojo jojowil at hvcc.edu
Fri Feb 24 18:18:12 GMT 2006


----- Original Message ----- 
From: "Matt Johnson" <mwj at doc.ic.ac.uk>
To: <samba at lists.samba.org>
Sent: Friday, February 24, 2006 11:21 AM
Subject: [Samba] hanging smbd(s) revisited


> Back in December, I was having problems with fcntl64 locks hanging.
>
> On Thu, 15 Dec 2005, Matt Johnson wrote:
>
> > On Thu, 15 Dec 2005, Gerald (Jerry) Carter wrote:
> >
> >>  Hmm....I'm not sure if the bug was strictly in the new oplock
> >>  implement (post 3.0.20) but it doesn sound like what you are
> >>  describing.  If you want to the the SAMBA_3_0_RELEASE branch
> >>  and see that would be appreciated.  The final 3.0.21 release
> >>  should happen soon (just 2 more bugs to finish)
> >
> > We'll give that a shot next week on a test server.
> >
> > It being related to oplocks seems to make lots of sense -- we killed
oplocks
> > in our config and the problem has gone away (but we now obviously have a
> > *very* slow Samba service).
>
> Well... it has been fine since we upgraded to 3.0.21b and reenabled
> oplocks, but today we had exactly the same problem recur:
>
> [root at shrike fd]# strace -p 19827
> Process 19827 attached - interrupt to quit
> fcntl64(15, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=632, len=1}
<unfinished ...>
> Process 19827 detached
>
> fd 15 = locking.tdb
>
> ...so perhaps the frequency of the problem has been reduced, but the
> problem itself has not been resolved.
>


I have an open call to IBM about a similar problem in AIX. I've been
suspiscious about locking.tdb, but we don't use oplocks on our server and I
can't definitively point the finger at Samba.

Unfortunately getting a log level 10 is prohibitive given the massive usage
on this server.

I'm not sure how yours manifests, but our is definitely load related. Each
hour our students log off from one Windows Xp-SP2 machine and then log into
another.

If we bounce smbd through swat, the problem goes away until about 24 hours
later. Now if we set deadtime to say 15 or 30 minutes, then we seem to last
beyond the 24 hour period.

Truss looks like this:

kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\0A6", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 166)     = 166
statx("profile/Cookies/01971116 at launch.yahoo[1].txt", 0x2FF20AA0, 128, 010)
= 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\0A6", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 166)     = 166
statx("profile/Cookies/01971116 at killermovies[1].txt", 0x2FF20AA0, 128, 010)
= 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\0A0", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 160)     = 160
statx("profile/Cookies/01971116 at kb.elidel[2].txt", 0x2FF20AA0, 128, 010) = 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\0B8", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 184)     = 184
statx("profile/Cookies/01971116 at jupiter.us.intellitxt[1].txt", 0x2FF20AA0,
128, 010) = 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\09C", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 156)     = 156
statx("profile/Cookies/01971116 at jumpusa[1].txt", 0x2FF20AA0, 128, 010) = 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\094", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 148)     = 148
statx("profile/Cookies/01971116 at jcp[2].txt", 0x2FF20AA0, 128, 010) = 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0
kwrite(5, "\0\0\0 tFF S M B 2\0\0\0".., 120)    = 120
_select(23, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(5, "\0\0\09E", 4)                         = 4
kread(5, "FF S M B 2\0\0\0\01807C8".., 158)     = 158
statx("profile/Cookies", 0x2FF20AA0, 128, 010)  = 0
statx("profile/Cookies/01971116 at jcpenney[1].txt", 0x2FF20AA0, 128, 010) = 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kfcntl(15, F_SETLKW, 0x2FF21540)                = 0


on a logout and the followin on a login:


setreuid(-1, 20644)                             = 0
getuidx(1)                                      = 20644
kfcntl(15, F_SETLK, 0x2FF22198)                 = 0
kfcntl(15, F_SETLK, 0x2FF22198)                 = 0
kfcntl(15, F_SETLKW, 0x2FF22118)                = 0
kfcntl(15, F_SETLKW, 0x2FF22118)                = 0
kfcntl(15, F_SETLKW, 0x2FF22258)                = 0
kfcntl(14, F_SETLKW, 0x2FF22230)                = 0
kfcntl(14, F_SETLKW, 0x2FF22230)                = 0
close(32)                                       = 0
kwrite(25, "\0\0\0 #FF S M B04\0\0\0".., 39)    = 39
_select(26, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(25, "\0\0\0 H", 4)                        = 4
kread(25, "FF S M B 2\0\0\0\01807C8".., 72)     = 72
fstatx(29, 0x2FF217A0, 128, 010)                = 0
kfcntl(15, F_SETLKW, 0x2FF215E0)                = 0
kill(1900706, 0)                                = 0
kfcntl(15, F_SETLKW, 0x2FF215B0)                = 0
kwrite(25, "\0\0\0 dFF S M B 2\0\0\0".., 104)   = 104
_select(26, 0x2FF20A08, 0x00000000, 0x00000000, 0x2FF22A08) = 1
kread(25, "\0\0\0B8", 4)                        = 4
kread(25, "FF S M BA2\0\0\0\01807C8".., 184)    = 184
statx("profile/Cookies", 0x2FF21360, 128, 010)  = 0
statx("profile/Cookies/00720237 at www.creatrixads[43].txt", 0x2FF21360, 128,
010) = 0
kfcntl(15, F_SETLKW, 0x2FF21DE0)                = 0
open("profile/Cookies/00720237 at www.creatrixads[43].txt",
O_RDONLY|O_LARGEFILE) = 30
kfcntl(15, F_SETLKW, 0x2FF21D00)                = 0
kfcntl(15, F_SETLKW, 0x2FF21D00)                = 0
kfcntl(15, F_SETLKW, 0x2FF21DB0)                = 0
kwrite(25, "\0\0\0 gFF S M BA2\0\0\0".., 107)   = 107


filemon summary shows fd 3 with 250MB/s for a process that's not running any
more and IBM cannot tell me why that happens. (we're AIX 5.2 TL-08-1 and
3.0.21b) So that's why I haven't been that suspicious of Samba.

If Jerry or Jeremy can make a suggestion of what to look for, I'll set it up
on my end as well. We don't have Samba actually *stop*, it's just
*reeeeealy* slow and context switches go fromabout 8-20k/sec(with 120k
syscalls) to over 60k/sec (with only 30k syscalls).

The other noticeable trait is the time it takes to stop smbd. If I didn't
know any better, I'd swear they are trying to unravel some seriously nested
stack of calls or complex pieces of data. It takes almost two minutes for
all the smbd's to clear out after stopping from SWAT.


I'll collect anything else you need.


Cheers,

Bill



> What do you folks need to debug this?
>
> --M
>
> -- 
> ======================================================================
> Matt Johnson <mwj at doc.ic.ac.uk>               (020) 7594 8440 / x48440
> Systems Programmer, Computing Support Group         Office: Huxley 225
> Department of Computing, Imperial College London
> ======================================================================
> -- 
> To unsubscribe from this list go to the following URL and read the
> instructions:  https://lists.samba.org/mailman/listinfo/samba
>



More information about the samba mailing list