[Samba] Re: smbd blocking in receive_smb (how to trigger procedure - LONG)

john.debella at teradyne.com john.debella at teradyne.com
Tue Dec 28 22:24:00 GMT 2004


Hi,

We've been experiencing this same problem, however, with Samba 
3.0.7-1.3E.1 running on RedHat ES 3.0. We've also experienced this on 
2.2.7x on RedHat 9. The problem has been intermittently happening over the 
last several weeks, (I've posted list about it).

This message, along with the others in the thread, describes the symptoms 
that we're seeing, however, I haven't been able to determine whether this 
has been fixed or if there is a fix. There were responses in the thread 
that indicated using the "dead time" parameter to kill off dead processes. 


Does anyone know if this has been fixed or have any idea whether tunning 
the "dead time" parameter or some other thing has resolved this?

Any help would be appreciated.

-John


List:       samba-technical
Subject:    Re: smbd blocking in receive_smb (how to trigger procedure - 
LONG)
From:       Pascal <pascal () vmfacility ! fr>
Date:       2002-08-06 18:29:39
[Download message RAW]

I think I finally found a BUG (?) in samba 2.2.5 and now I found a means 
to 
reproduce that kind of problem without incriminating any network problem 
at 
all (I guess...). But I need a FIFO file on Unix in order to trigger it 
very 
easilly.

I'd like to know if other (especially SAMBA DEVELOPPERs) can reproduce it.

So this is the procedure :

1. take a samba server 2.2.5 (I have mandrake cooker 
samba-server-2.2.5-14mdk)
on a linux kernel version 2.4.18-23mdk

2. samba config :
# Global parameters
[global]
        workgroup = VMF
        netbios name = MYNAME
        server string = VMF Samba Server %v on %h
        interfaces = xx.xx.xx.xx/24 yy.yy.yy.yy/24
        bind interfaces only = Yes
        encrypt passwords = Yes
        map to guest = Bad User
        syslog = 10
        log file = /data/reports/xxxxxxx/smblog-%m.txt
        max log size = 500
        name resolve order = wins host bcast
        unix extensions = Yes
        socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 
SO_KEEPALIVE
        os level = 65
        lm announce = False
        preferred master = True
        domain master = True
        wins support = Yes
        message command = /usr/bin/mail -s 'winpopup msg from %f on %m' 
root < 
%s; rm %s
        time offset = 60
        panic action = echo samba crash | mail -s 'samba crash' root
        ;admin users = pascal ivan
        printer admin = pascal ivan
        hosts allow = xx.xx.xx.xx/255.255.255.0 yy.yy.yy.yy/255.255.255.0
        printing = cups
        # added by PC 23/07/2002
        kernel oplocks = no
        oplock break wait time = 100
        # added by PC 05/08/2002
        deadtime = 15

[print$]
        path = /data/home/smbprinters
        write list = root pascal ivan

[printers]
        comment = All Printers
        path = /var/spool/samba
        create mask = 0700
        guest ok = Yes
        printable = Yes
        writable = No
        print command = lpr-cups -P %p -o raw %s -r   # using client side 
printer drivers.
        lpq command = lpstat -o %p
        lprm command = cancel %p-%j
        browseable = No

[Docs]
        comment = Documents VMF
        path = /data/docpasc
        valid users = pascal ivan
        force user = docs
        force group = docs
        read only = No
        create mask = 0664
        force create mode = 0664
        directory mask = 0750

3. I create a unix FIFO on /data/docpasc
mkfifo /data/docpasc/FIFO
prw-rw-r--    1 pascal   docs            0 Aug  6 19:26 
/data/docpasc/FIFO|

4. take a windows client (win2k, win98se tested)
go to the explorer and open the Docs share
now click on the fifo name and right-clic to get the menu
Your client is frozen

5. go to samba server and try to figure out what happens, here is the log 
:

# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
pascal       pascal   pascal    2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
unsaved      pascal   vmf-staff  2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:46 2002
Docs         docs     docs      2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002

No locked files

# tdbtool is used to check I can read the entire lockfile
# ./tdbtool /var/cache/samba/locking.tdb
tdb> list
freelist:
tdb> quit
# # client will now open explorer and look at the properties of a FIFO 
(unix) 
File on the Docs share
# now client is frozen
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
IPC$         nobody   65534     2699   pcpcwin  (212.208.17.83) Tue Aug  6 

18:21:51 2002
pascal       pascal   pascal    2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
IPC$         nobody   65534     2699   pcpcwin  (212.208.17.83) Tue Aug  6 

18:21:51 2002
unsaved      pascal   vmf-staff  2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:46 2002
Docs         docs     docs      2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002

Locked files:
Pid    DenyMode   Access      R/W        Oplock           Name
--------------------------------------------------------------
2675   DENY_NONE  0x1         RDONLY     EXCLUSIVE+BATCH 
/data/docpasc/Folder 
Settings/Folder.htt   Tue Aug  6 18:21:31 2002
2675   DENY_WRITE 0x3         RDWR       EXCLUSIVE+BATCH 
/data/docpasc/desktop.ini   Tue Aug  6 18:21:31 2002

# # as you see there is already 2 process for the same machine !
# date
Tue Aug  6 19:22:37 CEST 2002
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
IPC$         nobody   65534     2699   pcpcwin  (212.208.17.83) Tue Aug  6 

18:21:51 2002
pascal       pascal   pascal    2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
IPC$         nobody   65534     2699   pcpcwin  (212.208.17.83) Tue Aug  6 

18:21:51 2002
unsaved      pascal   vmf-staff  2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:46 2002
Docs         docs     docs      2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002

Locked files:
Pid    DenyMode   Access      R/W        Oplock           Name
--------------------------------------------------------------
2675   DENY_NONE  0x1         RDONLY     EXCLUSIVE+BATCH 
/data/docpasc/Folder 
Settings/Folder.htt   Tue Aug  6 18:21:31 2002
2675   DENY_WRITE 0x3         RDWR       EXCLUSIVE+BATCH 
/data/docpasc/desktop.ini   Tue Aug  6 18:21:31 2002

# strace -p 2699
select(21, [5 13 20], NULL, NULL, {44, 880000} <unfinished ...>
# strace -p 2675
open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...>
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
pascal       pascal   pascal    2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
unsaved      pascal   vmf-staff  2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:46 2002
Docs         docs     docs      2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

Locked files:
Pid    DenyMode   Access      R/W        Oplock           Name
--------------------------------------------------------------
2675   DENY_NONE  0x1         RDONLY     EXCLUSIVE+BATCH 
/data/docpasc/Folder 
Settings/Folder.htt   Tue Aug  6 18:21:31 2002
2675   DENY_WRITE 0x3         RDWR       EXCLUSIVE+BATCH 
/data/docpasc/desktop.ini   Tue Aug  6 18:21:31 2002

# strace -p 2709
fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1} 
<unfinished ...>
# # now all smbd that will start will be in a lock in the locking.tdb
# lsof -p 2709 | grep locking.tdb
smbd    2709 docs  mem-r  REG    3,5    8192  38698 
/var/cache/samba/locking.tdb
smbd    2709 docs   10ur  REG    3,5    8192  38698 
/var/cache/samba/locking.tdb
# # now we cannot access the locking.tdb with tdbtool
# ./tdbtool /var/cache/samba/locking.tdb
tdb> list
hash=11
 rec: offset=856 next=0 rec_len=152 key_len=16 data_len=129 
full_hash=0xcc182ae9 magic=0x26011999

# ./tdbtool /var/cache/samba/locking.tdb
tdb> dump

key 16 bytes
...............
data 129 bytes
[000] 01 00 00 00 E8 3D 29 40  A3 BF 00 00 00 00 00 00  .....=)@ .......
[010] 01 00 00 00 F0 38 26 08  66 00 00 00 00 00 00 00  .....8&. f......
[020] 00 00 00 00 00 00 00 00  00 00 00 00 73 0A 00 00  ....... ....s..
[030] 11 80 03 00 40 00 00 00  01 00 00 00 9B 05 50 3D  .... at .. ......P=
[040] 4F E8 01 00 00 09 00 00  00 00 00 00 A5 BF 00 00  O...... .......
[050] 00 00 00 00 02 00 00 00  2F 64 61 74 61 2F 64 6F  ....... /data/do
[060] 63 70 61 73 63 2F 46 6F  6C 64 65 72 20 53 65 74  cpasc/Fo lder Set
[070] 74 69 6E 67 73 2F 46 6F  6C 64 65 72 2E 68 74 74  tings/Fo lder.htt
[080] 00

# # i was to CTRL-C the tool to get out
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
pascal       pascal   pascal    2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
unsaved      pascal   vmf-staff  2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:46 2002
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2675   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:19:45 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

Locked files:
Pid    DenyMode   Access      R/W        Oplock           Name
--------------------------------------------------------------
2675   DENY_NONE  0x1         RDONLY     EXCLUSIVE+BATCH 
/data/docpasc/Folder 
Settings/Folder.htt   Tue Aug  6 18:21:31 2002
2675   DENY_WRITE 0x3         RDWR       EXCLUSIVE+BATCH 
/data/docpasc/desktop.ini   Tue Aug  6 18:21:31 2002

# # now i write something to the pipe
# echo coucou > /data/docpasc/FIFO
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

No locked files

# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

No locked files

# ./tdbtool /var/cache/samba/locking.tdb
tdb> list

# # some cleanup has been done but the tdb is still locked
# strace -p 2720
select(21, [5 13 20], NULL, NULL, {39, 820000} <unfinished ...>
# strace -p 2709
open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...>
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

No locked files

# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

No locked files

# # client is not blocked and the right-clic panel of the FIFO file is 
opened
# # I select propertied
# # I select properties
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
Docs         docs     docs      2775   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:29:19 2002
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

No locked files

# # now I have another smbd !
# strace -p 2775
fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1} 
<unfinished ...>
# strace -p 2720
fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1} 
<unfinished ...>
# strace -p 2709
open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...>
# # on still trry to open the pipe
# # the others are in the locking.tdb lock, waiting for it to be released
# smbclient \\\\pumpkin\\Docs -I 192.168.0.3 -U ivan
added interface ip=192.168.0.3 bcast=192.168.0.255 nmask=255.255.255.0
Password:
Domain=[XXX] OS=[Unix] Server=[Samba 2.2.5]
smb: \> ls
  .                                   D        0  Tue Aug  6 15:34:07 2002
  ..                                  D        0  Mon Feb 25 19:54:32 2002
  FIFO                                         0  Tue Aug  6 18:26:28 2002

                55700 blocks of size 2097152. 37394 blocks available
smb: \> quit
# echo dump | ./tdbtool /var/cache/samba/locking.tdb

# still locked
# smbstatus

Samba version 2.2.5
Service      uid      gid      pid     machine
----------------------------------------------
Docs         docs     docs      2790   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:30:40 2002
Docs         docs     docs      2802   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:31:48 2002
Docs         docs     docs      2775   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:29:19 2002
Docs         docs     docs      2720   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:24:00 2002
Docs         docs     docs      2813   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:32:43 2002
Docs         docs     docs      2709   ivpcvmwin98se (192.168.0.126) Tue 
Aug 
6 18:22:56 2002

No locked files

# # now the smbd count will grow as windows retries to access the share...
#


This can be reproduced at any samba 2.2.5 site I guess. The problem is 
that 
the smbd blocked in the FIFO open maintain a lock on the locking.tdb that 
prevent other smbd to work properly (they block at start during 'lock 
table 
scan for cleanup' phase I presume). 

The fifo is only a trigger here, but I wonder if some network problems 
(socket 
reset, signals...) arriving at a specific moment can trigger the same kind 
of 
deadlocks... ?

So , it could be very important to investigate what happens here, maybe it 

will solve lots of similar bugs. I'd like to hear from the technical 
developper team on this part of the code.

Thank you,
Pascal
VMF




Le Mardi 6 Août 2002 17:54, Pascal a écrit :
> Investigating more last lite (in France :) ) I find in fact the 
following:
>
> 1. One SMBD was in a read on a FIFO
>     Since this read is restartable (I tested it with a kill and beeing
> under strace) this smbd will NEVER be terminated (the fifo was left on a
> share by a very old process, and this smbd tried to read it, as a normal
> processing of share backup...)
>
> 2. while in this state, I had lots of other smbd stucked in :
> fcntl64(14, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=256, 
len=1}
> <unfinished ...>
> fd 14 is the locking.tdb database here (lsof proof)
>
> 3. I compiled tdbtool and tried to list the locking database
>  ==> tdbtool list or dump blocks each time at the same place in the
> locking.tdb   same ofset as the other smbd !
>
> 4. I removed the FIFO (rm)
>
> 5. my first smbd then terminated properly
>
> 6. all the other blocked smbd terminated properly
>
> 7. smbstatus showed  ghost locks (pid no more present)
>
> 8. smbclient -L localhost ... cleaned up the database as expected.
>
> I am in samba 2.2.5   ( -14mdk  mandrake serial)
> several times a week I have locks problem like that in the server and
> nobody can access the share during this time. It's the first time I 
notice
> that locking.tdb is not scanable, I have to checkout next time too.
>
> I have started today a debug level 10 but to no avail - everything works
> fine today :(
>
> maybe a race condition or timing problem in the lock code on 
locking.tdb.
>
> I really wonder if it is possible for a smbd process to block other 
smbds
> only because it has a byte range lock on this database ? I know that at
> start, new smbd scan this table to remove dead locks, can this scan be
> blocked at some point in the samba code ?
>
> any clue would be apreciated.
>
>
> Pascal
>
> Le Mardi 6 Août 2002 17:02, David Collier-Brown a écrit :
> > Pascal wrote:
> > > I've read your problem report in the samba-technical list and it 
seems
> > > I am experiencing the same symptoms on some servers too.
> > >
> > > I noticed that the 'dead' smbd are in fact blocked because the 
network
> > > socket is still to be closed by the client side (connexion showed as
> > > CLOSE_WAIT in netstat -an).
> >
> >              Hmmn: I think this is another variant of a known
> >              Windows <censored> feature.  Try adding
> >                              dead time = 10
> >                              keepalive = 3600
> >              to the conf file, which should clean up broken
> >              clients after 10 minutes of apparent death.
> >
> >              If ten minutes is too long 9and it may well be),
> >              change just keepalive to 30 (seconds).  This will
> >              cause samba to notice if a client goes silent for 30
> >              seconds and send them an "are you alive" query.  If
> >              they aren't alive samba will tear them down and release
> >              any  datastructures and locks they're holding.
> >
> >              This is usually a symptom of a network or client
> >              problem: I have a note about an earlier symptom at
> > http://www.geocities.com/orville_torpid/papers/ethernet-note.txt
> >
> > Horst wrote:
> > > The visible symptom is that users find themselves locked out of 
files
> > > by their own processes.  On the server side we find that users have
> > > multiple smbd's running that are talking to the same PC.  One smbd 
will
> > > be the active one and others (older ones) will be blocked and will 
not
> > > respond to SIGTERM.  Killing with other signals works, but leaves 
old
> > > locks in the lock database.
> > >
> > > Below I've included a sample of four processes shown in smbstatus
> > > output and the gdb backtrace of each of the four.  Three of them are
> > > blocked in read_data and not being responsive to the client PC 
anymore.
> > >  Clearly, as each of these processes blocked the client PC happily
> > > started a new one.
> > >
> > > The smbd version is 2.2.5 on Linux, Redhat 7.2.
> > >
> > > I realize that everyone is working on more exciting aspects of Samba
> > > but wonder if anybody more familiar with the util_sock.c routines 
and
> > > signalling would have ideas or hints for me to debug further.  The
> > > receive code gets twisty and I'm not sure I totally understand how 
it's
> > > supposed to be working with keepalive packets and such.
> > >
> > > --Eric
> >
> > --dave


More information about the samba mailing list