Alternate approaches to lock/dead-client problem.

David Collier-Brown davecb at Canada.Sun.COM
Tue Jan 26 15:39:49 GMT 1999


In the samba at samba.org mailing list, Nicolas.Williams at wdr.com wrote:
> Subject: Samba, NT, and transient network failures
> The problem can be summarized as follows:
> 
>  - An NT client is connected to Samba server. The user has some file(s)
>    open on a Samba-exported share.
>  - The NT client and the Samba server lose network connectivity, for
>    whatever reason, for a short period of time.
>  - The user or his application attempt to save his file, and the
>    application hangs due to the unavailability of the Samba server.
>  - The NT client kernel abandons it's TCP connection to the server and
>    attempts to establish a new connection.
>  - Network connectivity between the NT client and the Samba is restored.
>  - The NT client establishes a new TCP connections to the Samba server.
> 
>  PROBLEM:
> 
>  - The smbd process for the old TCP connection hangs around because as
>    far as it and the Unix kernel are concerned the old TCP connection is
>    still alive. This process still holds and honors all locks, oplocks
>    and deny modes held by the NT client.
>  - The new smbd honors the locks held by the old smbd.
>  - The user's applications on the NT client continue to hang as they
>    block while the old smbd holds the old locks.

	Mr. Williams suggests having the new smbd kill the old smbd, thusly:

> There is a solution to this problem. If the new smbd kills the old smbd,
> then the old smbd releases all of its locks, leaving the NT client free
> to re-establish its locks (which it does, though only as each
> application accesses files it had open on the Samba share, rather than
> attempting to re-obtain all locks on that share in one fell swoop).

	I suspect this is another artifact of the MS client's tendency
	to mistake a temporary network problem for a major failure
	(e.g., see a disconnected cable as a server crash (:-))

	I'd like to suggest looking at the locking/oplocking behavior
	instead of directly killing the smbd:  if a lock is held on a
	file X by a client Y, and another lock is requested, then
	we may want to check that the lock's holder (Y') is still alive.

	If this were just an oplock, the probable behavior would be
	for the server to send a "break" request to the client, the
	client be fond to be dead and the server to clean up...

	Arguably, any other lock request from the same client should
	trigger a similar check for the death of the "first" client.

	As a non-expert, this raises two questions in my mind
		1) is there room to store client IDs in the locks structure?
		  I see a flock64 used as SMB_STRUCT_FLOCK, which had a
		  pid, which will tell us the nmbd to ask, but I don't see
		  an obvious and inexpensive way to ask "which client"
		2) could one reorder the locking logic so that the oplock
		   test comes first, which would presumably detect a dead
	           client at low cost... 

--dave (trying to think of a third question) c-b
-- 
David Collier-Brown,  | Always do right. This will gratify some people
185 Ellerslie Ave.,   | and astonish the rest.        -- Mark Twain
Willowdale, Ontario   | http://java.science.yorku.ca/~davecb
Work: (905) 477-0437 Home: (416) 223-8968 Email: davecb at canada.sun.com


More information about the samba-technical mailing list