[jcifs] Re: jcifs-1.2.4 stability fixes

Sat Oct 8 00:15:27 GMT 2005

On Fri, 7 Oct 2005 11:07:42 +0200
Lars Heete <hel at admin.de> wrote:

> > One of the main problems with your original patch was that it was mostly
> > irrelevant stuff. If you resubmit it with only the changes necessary to
> > make the client behave the way you want then I will reconsider it. Aside
> > from the bug regarding adding the session back to transport.sessions
> > introduced in 1.2.5 (and your original "live lock" issue), that code
>
> there also is the deadlock between sessionSetup and send introduced in 1.2.4 
> (doSend0/sendrecv calling disconnect on IOError). Anyway I think 

Do I know about this one? What is this one all about?

> disconnecting on  errors in other places then the transport thread is bogous, 
> because on a busy transport this will generate a queue of threads waiting on 
> the transport monitor trying to call disconnect. This would be no problem 
> alone, but there may be also threads waiting on the transport monitor for 
> sessionSetup or connect, so you get something like this
> 
> disconnect
> connect
> disconnect
> disconnect
> connect
> disconnect
> connect
> 
> untill all threads with sending errors are done.

Correct. Each thread is serialized so Thread1 tries and fails to connect
and calls disconnect. Then Thread2 wakes up tries and fails to connent
and calls disconnect. Then Thread3 and so on ad infinitum. AFAIC this
is the correct behavior because only one thread can be allowed to work
on changing the connection state of the transport.

The "test" branch changed this behavior slightly. When Thread1 gets an
error it calls disconnect and sets the state to be in "error". Thread2
then sees the state is "error", resets the state to 0 (not connected)
and then throws a "transport in error" exception. Thread3 then tries and
fails to connect and calls disconnect, etc. So the number of cycles is
effectively cut in half.

So what do you *want* to happen?

> here actually is an example using JCIFS-1.2.5 with my Crawler test  after a 
> single server-side transport close (actually not simple to trigger without 
> generating deadlocks, even when using only one session). I inserted a log 
> message in connect and disconnect.  
> 
> disconnecting
> smb://194.39.184.7/test/test/CopyTo/DOMEditor/CVS/DOMEditor/domeditor/view/:
> jcifs.smb.SmbException:
> java.net.SocketException: Connection reset
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at jcifs.smb.SmbTransport.doSend(SmbTransport.java:398)
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:68)
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:580)
<snip>
> connecting
> disconnecting
> connecting
> 
> This was the behavior I learned when anlyzing the problems in jcifs 1.1 that 
> never worked stable for the app i try to support. In jcifs-1.2 this was 
> changed to disconnecting only in the transport thread, but reintroduced in 
> 1.2.5 order to implement error handling when server drops open transport.

This seems like the correct behavior to me. I mean you got an exception
because the server closed the transport. Or am I misunderstanding you?

> Actually it may not be needed anymore even without my changes, since the 
> transport now handles peekKey()==null (have to check this).
> But the cancellation of active request on hard disconnect is something 
> definitly needed for correct error reporting (even with locking done right).
> My transport changes may be discussable, but are in no way irrelevant.
> 
> > [1] Note that lowering jcifs.smb.client.ssnLimit would probably suffice
> >     to resolve this issue. It was not really anticipated that the number
> >     of outstanding requests on a transport at one time would be more
> >     than say 2 or 3.
> Not really an option if you may have hundreds of sessions per server.

Why not? I don't think you understand what ssnLimit does. If ssnLimit is
say 250 (the default) that means that after the 250th session, instead
of reusing the existing transport, a new one is created which itself
can then support 250 sessions. Meaning if you set ssnLimit to 25 and you
have 500 sessions, they will be spread across 500 / 25 = 20 transports
cutting your response_map.notifyAll() overhead by a factor of 20. That
sounds like an option worth trying to me.

The SmbThreadTest Results

I have spent some time looking at your SmbThreadTest.java example. I
did find one issue. It turns out that the SmbTransport.matches() routine
wasn't quite right. It needs to look like this:

  boolean matches( UniAddress address, int port,
           InetAddress localAddr, int localPort ) {
    return address.equals( this.address ) &&
          (port == 0 || port == this.port ||
              /* port 139 is ok if 445 was requested */
              (port == 445 && this.port == 139)) &&
          (localAddr == this.localAddr ||

because if the transport tries to connect and falls back to port 139 then
the ports will not match and SmbTransport.getSmbTransport() will end
up creating a new transport every time. This will result in many many
sockets being opened to the target machine which will eventually choke
and the client will throw exceptions like SocketeException: socket closed.

Out of Memory Condition

The only way I could generate exceptions from your SmbThreadTest program
was if the VM ran out of memory at which point it's fairly expected that
nothing is going to work right. Actually when it comes to exhausting
memory, this test is rather evil because it creates greater and greater
numbers of crawlers with a maxDepth of 1000. And because every call to
traverse() creates an SmbFile[] array and then proceeds to call traverse()
recursively, you have potentially thousands of these arrays resulting
in HUNDREDS OF THOUSANDS of SmbFile objects (try -Xrunhprof:heaps=sites
and look at the resulting java.hprof.txt). The result is that the whole
thing just runs out of memory (rather quickly actually) and you start to
get all sorts of exceptions (usually starting with "timedout waiting for
response" TransportExceptions because the client doesn't have the memory
to make progress). In the tests that I ran, once the number of threads
reached about 245 I would get a few OutOfMemoryErrors and then a large
number of exceptions because all the threads start to fail. I don't think
there's much I can do about this if anything. Are there some different
values for sleep periods and such that I should try for this test?

Managing the SmbTransport.sessions List

As for the outstanding issue of when and how to add and remove sessions
from the SmbTransport.sessions list this is still not handled correctly
even in the "test" branch. I didn't realize that the SmbTree.session
member is never released once set. This means that (as you originally
discovered) if we remove the session from the SmbTransport.sessions list
(such as when disconnect logs off all sessions) it will not be re-added when
the session is re-"setup". This is not catastrophic but extra redundant
sessions may be created if other threads request a session with the same
credentials.

For now I have simply stoped removing the sessions from the sessions
list. That should keep things normalized, but the downside to this is that
the sessions will never be released which is a waste of memory. I think
the correct solution here might be to dynamically aquire the session in
SmbTree.java like we do with the transport() in SmbSession but I'm not
certain of the implications of doing that.

jcifs-1.2.6 is in the download area. It's based on 1.2.5 with the minor
fixes mentioned. If you want to suggest changes I would appreciate it
if you could work from this package.

Mike