[jcifs] Re: jCIFS deadlock issue - when can we expect a fix?

Mon Feb 20 17:56:47 GMT 2006

Hi Mike,

> I don't have the time to make the change and then go through a
> stabilization period. You're on your own. Sorry.

Ok. No problem. It is very helpful already to have a very responsive
responsible developer available.

> Look at the archives about this. There were two solutions discussed. One
> moved a statement out of a synchronized() block. It wasn't correct
> synchronization but I think the author of the patch claimed that it
> worked. I believe I speculated at the time that the correct solution was
> to add some kind of refcounting to certain objects. I don't remember the
> details. If you locate the thread in the archives send me a pointer and
> I'll see if I can narrow it down further.

Ok, thank you very much for the pointer. I'll see what I can do and let
you know in case of questions. I don't want to change too much to keep
the solution easily portable to future version.

The current solution is basically an additional synchronization in
SmbSession#send() and Transport#disconnect() on an additional object per
Transport instance (_oSetupDisconnectMutex):

SmbSession#send():

void send(ServerMessageBlock request,
          ServerMessageBlock response)
throws SmbException
{
        if( response != null ) {
            response.received = false;
        }

        synchronized(transport._oSetupDisconnectMutex)
        {
         expiration = System.currentTimeMillis() +
                      SmbTransport.SO_TIMEOUT;
         sessionSetup( request, response );
         if( response != null && response.received ) {
             return;
         }
         request.uid = uid;
         request.auth = auth;
         transport.send( request, response );
        }
}

Transport#disconnect():
public void disconnect( boolean hard ) throws IOException {
synchronized(_oSetupDisconnectMutex)
{
 synchronized(this)
 {
        switch (state) {
            case 0: /* not connected - just return */
                return;
            case 3: /* connected - go ahead and disconnect */
                if (response_map.size() != 0 && !hard) {
                    break; /* outstanding requests */
                }
                doDisconnect( hard );
            case 4: /* in error - reset the transport */
                thread = null;
                state = 0;
                break;
            default:
                throw new TransportException( "Invalid state: " + state );
        }
  }
 }
}

First tests look good, but the tests looked good for another try
(basically the same approach; but instead of SmbSession#send(),
SmbSession#sessionSetup() was touched) last week that failed at the
weekend. I'll let you know if it works finally.

Another point that I just noticed today: During the tests I tried to
avoid the deadlock by just setting the ssnLimit to 1, as this should
create a single Transport per Session. The test is running 150
concurrent threads (to quickly reproduce the issue) accessing a single
Windows share; each thread accesses a single file and executes a loop
that constantly calls SmbFile#getFreeSpace(), deletes the file if it is
available, recreates it (32k), checks if it exists, reads it again, and
deletes it again. The operations are chosen on the operations executed
when it deadlocked before. However, the reason why I write that is that
I saw that the library opened a lot of sockets and did not even reuse
them - looked like they were all timing out. It ended up in more than
2500 open sockets. Could it be that the library does not reuse the
connections in this special case while it does when ssnLimit is for
example unset?

Best regards,
Ronny