bug: fifo opened in blocking mode by smbd while holding a lock on locking.tdb

Thu Aug 8 06:26:02 GMT 2002

This one is easilly reproducible on 2.2.5. 

Can someone having the latest 2.2.x cvs version confirm it please ?

Procedure :

Server 
Linux mandrake 9 (cooker)
kernel 2.4.18-23mdkenterprise
glibc 2.2.5-14mdk
samba-server 2.2.5-14mdk

Client
win2k US SP3 or WIN98SE Fr SP2

Samba options
oplocks = yes
kernel oplocks = no

0. have tdbtool compile and check it runs well
echo dump | tdbtool <path_to_locking.tdb>

1. make a fifo file on the server in shared directory
=> mkfifo path_of_shared_dir/FIFO

2. open explorer on the client 
go to the samba share
just clic on the fifo file name

=> verify your windows client is blocked
(you cannot clic anywhere else)

3. go to the server, issue smbstatus and get the smbd PID

4. strace -p PID
==> verify this smbd is in a BLOCKING OPEN on the fifo
use lsof -p PID to verify the file descriptor of OPEN correspond to the fifo 
file

5. use tdbtool (you need to get it ready and operational)
echo dump | tdbtool <path to locking.tdb>
==> verify that tdbtool does not finish (it is blocked on a lock on the tdb)

6. waiting for windows client to timeout and having control again (do not clic 
on fifo again), if you go to server you have 2 or more smbd for your client
=> the initial smbd still blocked in OPEN of the fifo
=> a second smbd blocked while trying to scan locking.tdb (to cleanup old 
locks I guess)
=> a third smbd in the regular select() waiting for move...

The problems I see here :

I suppose smbd want to register an oplock in locking.tdb before really opening 
the file on the server. So I imagine the following sequence :

- byte range lock in locking.tdb
- try to open the file requested
- register or not an oplock in locking.tdb
- free the lock in locking.tdb

If this is the case (only developper can confirm), it is not advisable to do 
blocking open() when the file is a fifo => this leads to a deadlock for 
subsequent smbd that will be restarted by the same client (since the first 
smbd does not respond to the client), on its 'old lock cleaning scan in 
locking.tdb' phase. 

Immediate solution : 
do not block on open of a fifo (o_nonblock).

But the sequencing of lock/unlock in locking.tdb should be investigated. We 
should release a lock when we know that we will busy doing or waiting 
something elsewhere.

Real or complementary solution:
do not old exclusive locks on locking.tdb too ealy, but only just before 
writing to it. It seems an exclusive lock is held here way too early before 
the real modification to to tdb.

Maybe the office open problems is based on the same kind of bad sequencing.

Pascal