[jcifs] Re: jCIFS deadlock issue - when can we expect a fix?

Ronny Schuetz usenet-01 at groombridge34.de
Mon Feb 20 19:44:01 GMT 2006


Michael B Allen wrote:

Hi Mike,

Thanks for the reply.

> That might work. If I remember the problem correctly the problem is that
> the transport thread is grabbing the transport lock and then tries to
> get another lock (I think it's SmbTransport.response_map). But before
> it can do so, the calling thread gets the gets the response_map lock
> and then tries to get the transport lock. So now you have:

Right.

> Thread-T has Lock-T trying to get Lock-M
> Thread-C has Lock-M trying to get Lock-T
> 
> So what you're doing is introducing a third lock Lock-X. So Thread-T gets
> Lock-X and then Lock-T. If a context switch occurs there Thread-C will try
> and fail to get Lock-X allowing Thread-T to get Lock-M and finish. Then
> Thread-C can get Lock-X, Lock-M, and Lock-T, do it's thing and complete.
> 
> It could kill concurrency and it's a little impure becuase you're
> basically using a lock to protect locks but I think it will work. If it
> doesn't, post a thread dump.

That was the idea. Thanks for your elaboration about it. Its better for
me to have a system that is a bit slower than one that deadlocks ;)
We'll see what happens tonight during the stability test.

> This is a combination of two things. One, JCIFS doesn't explicitly close
> sockets. They are closed after the soTimeout has expired. So if you set
> ssnLimit to 1 you will create a socket for each session and you will
> rapidly build up sockets. Generally you really want to avoid setting
> ssnLimit to 1 as it really destroys scalability. Two, the last TCP socket
> state is CLOSE_WAIT. You can see socket states using netstat -ta. If you
> see CLOSE_WAIT that means the sockets were closed but the kernel keeps the
> data structure around for a while for reasons I don't fully understand
> (supposedly it's to wait for the final ACK but I don't understand why
> the kernel would care or why it would dispose of the socket after it got
> it). What you want to do is to put a long sleep at the end of the program
> and run netstat -ta repeated to see if the sockets finally go away. If
> they're still there after 30 minutes, then there might be a problem.

Yes, they're going away after some seconds; they are building up that
quickly due to the nature of the test. I was just wondering, why they
are not reused, but I think I just did not completely considered how the
session concept works in jCIFS. So just for my clarification: If I'd set
lets say the ssnLimit to 100 but would run 75 threads connecting to the
same Windows share performing the operations listed above without any
pause, it could still happen that multiple sockets are opened?

Thanks & Best regards,
Ronny




More information about the jcifs mailing list