smbd blocking in receive_smb (how to trigger procedure - LONG)
john.debella at teradyne.com
john.debella at teradyne.com
Tue Dec 28 22:24:00 GMT 2004
Hi,
We've been experiencing this same problem, however, with Samba
3.0.7-1.3E.1 running on RedHat ES 3.0. We've also experienced this on
2.2.7x on RedHat 9. The problem has been intermittently happening over the
last several weeks, (I've posted list about it).
This message, along with the others in the thread, describes the symptoms
that we're seeing, however, I haven't been able to determine whether this
has been fixed or if there is a fix. There were responses in the thread
that indicated using the "dead time" parameter to kill off dead processes.
Does anyone know if this has been fixed or have any idea whether tunning
the "dead time" parameter or some other thing has resolved this?
Any help would be appreciated.
-John
List: samba-technical
Subject: Re: smbd blocking in receive_smb (how to trigger procedure -
LONG)
From: Pascal <pascal () vmfacility ! fr>
Date: 2002-08-06 18:29:39
[Download message RAW]
I think I finally found a BUG (?) in samba 2.2.5 and now I found a means
to
reproduce that kind of problem without incriminating any network problem
at
all (I guess...). But I need a FIFO file on Unix in order to trigger it
very
easilly.
I'd like to know if other (especially SAMBA DEVELOPPERs) can reproduce it.
So this is the procedure :
1. take a samba server 2.2.5 (I have mandrake cooker
samba-server-2.2.5-14mdk)
on a linux kernel version 2.4.18-23mdk
2. samba config :
# Global parameters
[global]
workgroup = VMF
netbios name = MYNAME
server string = VMF Samba Server %v on %h
interfaces = xx.xx.xx.xx/24 yy.yy.yy.yy/24
bind interfaces only = Yes
encrypt passwords = Yes
map to guest = Bad User
syslog = 10
log file = /data/reports/xxxxxxx/smblog-%m.txt
max log size = 500
name resolve order = wins host bcast
unix extensions = Yes
socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
SO_KEEPALIVE
os level = 65
lm announce = False
preferred master = True
domain master = True
wins support = Yes
message command = /usr/bin/mail -s 'winpopup msg from %f on %m'
root <
%s; rm %s
time offset = 60
panic action = echo samba crash | mail -s 'samba crash' root
;admin users = pascal ivan
printer admin = pascal ivan
hosts allow = xx.xx.xx.xx/255.255.255.0 yy.yy.yy.yy/255.255.255.0
printing = cups
# added by PC 23/07/2002
kernel oplocks = no
oplock break wait time = 100
# added by PC 05/08/2002
deadtime = 15
[print$]
path = /data/home/smbprinters
write list = root pascal ivan
[printers]
comment = All Printers
path = /var/spool/samba
create mask = 0700
guest ok = Yes
printable = Yes
writable = No
print command = lpr-cups -P %p -o raw %s -r # using client side
printer drivers.
lpq command = lpstat -o %p
lprm command = cancel %p-%j
browseable = No
[Docs]
comment = Documents VMF
path = /data/docpasc
valid users = pascal ivan
force user = docs
force group = docs
read only = No
create mask = 0664
force create mode = 0664
directory mask = 0750
3. I create a unix FIFO on /data/docpasc
mkfifo /data/docpasc/FIFO
prw-rw-r-- 1 pascal docs 0 Aug 6 19:26
/data/docpasc/FIFO|
4. take a windows client (win2k, win98se tested)
go to the explorer and open the Docs share
now click on the fifo name and right-clic to get the menu
Your client is frozen
5. go to samba server and try to figure out what happens, here is the log
:
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:46 2002
Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
No locked files
# tdbtool is used to check I can read the entire lockfile
# ./tdbtool /var/cache/samba/locking.tdb
tdb> list
freelist:
tdb> quit
# # client will now open explorer and look at the properties of a FIFO
(unix)
File on the Docs share
# now client is frozen
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6
18:21:51 2002
pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6
18:21:51 2002
unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:46 2002
Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
Locked files:
Pid DenyMode Access R/W Oplock Name
--------------------------------------------------------------
2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH
/data/docpasc/Folder
Settings/Folder.htt Tue Aug 6 18:21:31 2002
2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH
/data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002
# # as you see there is already 2 process for the same machine !
# date
Tue Aug 6 19:22:37 CEST 2002
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6
18:21:51 2002
pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6
18:21:51 2002
unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:46 2002
Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
Locked files:
Pid DenyMode Access R/W Oplock Name
--------------------------------------------------------------
2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH
/data/docpasc/Folder
Settings/Folder.htt Tue Aug 6 18:21:31 2002
2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH
/data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002
# strace -p 2699
select(21, [5 13 20], NULL, NULL, {44, 880000} <unfinished ...>
# strace -p 2675
open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...>
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:46 2002
Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
Locked files:
Pid DenyMode Access R/W Oplock Name
--------------------------------------------------------------
2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH
/data/docpasc/Folder
Settings/Folder.htt Tue Aug 6 18:21:31 2002
2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH
/data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002
# strace -p 2709
fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1}
<unfinished ...>
# # now all smbd that will start will be in a lock in the locking.tdb
# lsof -p 2709 | grep locking.tdb
smbd 2709 docs mem-r REG 3,5 8192 38698
/var/cache/samba/locking.tdb
smbd 2709 docs 10ur REG 3,5 8192 38698
/var/cache/samba/locking.tdb
# # now we cannot access the locking.tdb with tdbtool
# ./tdbtool /var/cache/samba/locking.tdb
tdb> list
hash=11
rec: offset=856 next=0 rec_len=152 key_len=16 data_len=129
full_hash=0xcc182ae9 magic=0x26011999
# ./tdbtool /var/cache/samba/locking.tdb
tdb> dump
key 16 bytes
...............
data 129 bytes
[000] 01 00 00 00 E8 3D 29 40 A3 BF 00 00 00 00 00 00 .....=)@ .......
[010] 01 00 00 00 F0 38 26 08 66 00 00 00 00 00 00 00 .....8&. f......
[020] 00 00 00 00 00 00 00 00 00 00 00 00 73 0A 00 00 ....... ....s..
[030] 11 80 03 00 40 00 00 00 01 00 00 00 9B 05 50 3D .... at .. ......P=
[040] 4F E8 01 00 00 09 00 00 00 00 00 00 A5 BF 00 00 O...... .......
[050] 00 00 00 00 02 00 00 00 2F 64 61 74 61 2F 64 6F ....... /data/do
[060] 63 70 61 73 63 2F 46 6F 6C 64 65 72 20 53 65 74 cpasc/Fo lder Set
[070] 74 69 6E 67 73 2F 46 6F 6C 64 65 72 2E 68 74 74 tings/Fo lder.htt
[080] 00
# # i was to CTRL-C the tool to get out
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:46 2002
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:19:45 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
Locked files:
Pid DenyMode Access R/W Oplock Name
--------------------------------------------------------------
2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH
/data/docpasc/Folder
Settings/Folder.htt Tue Aug 6 18:21:31 2002
2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH
/data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002
# # now i write something to the pipe
# echo coucou > /data/docpasc/FIFO
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
No locked files
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
No locked files
# ./tdbtool /var/cache/samba/locking.tdb
tdb> list
# # some cleanup has been done but the tdb is still locked
# strace -p 2720
select(21, [5 13 20], NULL, NULL, {39, 820000} <unfinished ...>
# strace -p 2709
open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...>
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
No locked files
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
No locked files
# # client is not blocked and the right-clic panel of the FIFO file is
opened
# # I select propertied
# # I select properties
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
Docs docs docs 2775 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:29:19 2002
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
No locked files
# # now I have another smbd !
# strace -p 2775
fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1}
<unfinished ...>
# strace -p 2720
fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1}
<unfinished ...>
# strace -p 2709
open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...>
# # on still trry to open the pipe
# # the others are in the locking.tdb lock, waiting for it to be released
# smbclient \\\\pumpkin\\Docs -I 192.168.0.3 -U ivan
added interface ip=192.168.0.3 bcast=192.168.0.255 nmask=255.255.255.0
Password:
Domain=[XXX] OS=[Unix] Server=[Samba 2.2.5]
smb: \> ls
. D 0 Tue Aug 6 15:34:07 2002
.. D 0 Mon Feb 25 19:54:32 2002
FIFO 0 Tue Aug 6 18:26:28 2002
55700 blocks of size 2097152. 37394 blocks available
smb: \> quit
# echo dump | ./tdbtool /var/cache/samba/locking.tdb
# still locked
# smbstatus
Samba version 2.2.5
Service uid gid pid machine
----------------------------------------------
Docs docs docs 2790 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:30:40 2002
Docs docs docs 2802 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:31:48 2002
Docs docs docs 2775 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:29:19 2002
Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:24:00 2002
Docs docs docs 2813 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:32:43 2002
Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue
Aug
6 18:22:56 2002
No locked files
# # now the smbd count will grow as windows retries to access the share...
#
This can be reproduced at any samba 2.2.5 site I guess. The problem is
that
the smbd blocked in the FIFO open maintain a lock on the locking.tdb that
prevent other smbd to work properly (they block at start during 'lock
table
scan for cleanup' phase I presume).
The fifo is only a trigger here, but I wonder if some network problems
(socket
reset, signals...) arriving at a specific moment can trigger the same kind
of
deadlocks... ?
So , it could be very important to investigate what happens here, maybe it
will solve lots of similar bugs. I'd like to hear from the technical
developper team on this part of the code.
Thank you,
Pascal
VMF
Le Mardi 6 Août 2002 17:54, Pascal a écrit :
> Investigating more last lite (in France :) ) I find in fact the
following:
>
> 1. One SMBD was in a read on a FIFO
> Since this read is restartable (I tested it with a kill and beeing
> under strace) this smbd will NEVER be terminated (the fifo was left on a
> share by a very old process, and this smbd tried to read it, as a normal
> processing of share backup...)
>
> 2. while in this state, I had lots of other smbd stucked in :
> fcntl64(14, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=256,
len=1}
> <unfinished ...>
> fd 14 is the locking.tdb database here (lsof proof)
>
> 3. I compiled tdbtool and tried to list the locking database
> ==> tdbtool list or dump blocks each time at the same place in the
> locking.tdb same ofset as the other smbd !
>
> 4. I removed the FIFO (rm)
>
> 5. my first smbd then terminated properly
>
> 6. all the other blocked smbd terminated properly
>
> 7. smbstatus showed ghost locks (pid no more present)
>
> 8. smbclient -L localhost ... cleaned up the database as expected.
>
> I am in samba 2.2.5 ( -14mdk mandrake serial)
> several times a week I have locks problem like that in the server and
> nobody can access the share during this time. It's the first time I
notice
> that locking.tdb is not scanable, I have to checkout next time too.
>
> I have started today a debug level 10 but to no avail - everything works
> fine today :(
>
> maybe a race condition or timing problem in the lock code on
locking.tdb.
>
> I really wonder if it is possible for a smbd process to block other
smbds
> only because it has a byte range lock on this database ? I know that at
> start, new smbd scan this table to remove dead locks, can this scan be
> blocked at some point in the samba code ?
>
> any clue would be apreciated.
>
>
> Pascal
>
> Le Mardi 6 Août 2002 17:02, David Collier-Brown a écrit :
> > Pascal wrote:
> > > I've read your problem report in the samba-technical list and it
seems
> > > I am experiencing the same symptoms on some servers too.
> > >
> > > I noticed that the 'dead' smbd are in fact blocked because the
network
> > > socket is still to be closed by the client side (connexion showed as
> > > CLOSE_WAIT in netstat -an).
> >
> > Hmmn: I think this is another variant of a known
> > Windows <censored> feature. Try adding
> > dead time = 10
> > keepalive = 3600
> > to the conf file, which should clean up broken
> > clients after 10 minutes of apparent death.
> >
> > If ten minutes is too long 9and it may well be),
> > change just keepalive to 30 (seconds). This will
> > cause samba to notice if a client goes silent for 30
> > seconds and send them an "are you alive" query. If
> > they aren't alive samba will tear them down and release
> > any datastructures and locks they're holding.
> >
> > This is usually a symptom of a network or client
> > problem: I have a note about an earlier symptom at
> > http://www.geocities.com/orville_torpid/papers/ethernet-note.txt
> >
> > Horst wrote:
> > > The visible symptom is that users find themselves locked out of
files
> > > by their own processes. On the server side we find that users have
> > > multiple smbd's running that are talking to the same PC. One smbd
will
> > > be the active one and others (older ones) will be blocked and will
not
> > > respond to SIGTERM. Killing with other signals works, but leaves
old
> > > locks in the lock database.
> > >
> > > Below I've included a sample of four processes shown in smbstatus
> > > output and the gdb backtrace of each of the four. Three of them are
> > > blocked in read_data and not being responsive to the client PC
anymore.
> > > Clearly, as each of these processes blocked the client PC happily
> > > started a new one.
> > >
> > > The smbd version is 2.2.5 on Linux, Redhat 7.2.
> > >
> > > I realize that everyone is working on more exciting aspects of Samba
> > > but wonder if anybody more familiar with the util_sock.c routines
and
> > > signalling would have ideas or hints for me to debug further. The
> > > receive code gets twisty and I'm not sure I totally understand how
it's
> > > supposed to be working with keepalive packets and such.
> > >
> > > --Eric
> >
> > --dave
More information about the samba-technical
mailing list