[jcifs] Re: jcifs-1.2.4 stability fixes

Sat Oct 8 10:32:20 GMT 2005

Hello
> On Fri, 7 Oct 2005 11:07:42 +0200
>
> Lars Heete <hel at admin.de> wrote:
> > > One of the main problems with your original patch was that it was
> > > mostly irrelevant stuff. If you resubmit it with only the changes
> > > necessary to make the client behave the way you want then I will
> > > reconsider it. Aside from the bug regarding adding the session back to
> > > transport.sessions introduced in 1.2.5 (and your original "live lock"
> > > issue), that code
> >
> > there also is the deadlock between sessionSetup and send introduced in
> > 1.2.4 (doSend0/sendrecv calling disconnect on IOError). Anyway I think
>
> Do I know about this one? What is this one all about?

To help your memories

-- excerpt from prevoius mails in this thread --
> > > > The current SmbTransport actually has a deadlock in the paths
> > > > doSend->disconnect (first rmap lock then transport lock) and
> > > > sessionSetup->send (first transport lock then rmap lock).
> > > > This is fixed doing the error handling only in the transport thread.
> > >
> > > I don't know about this. I think logic actually prevents this from
> > > ever happening because doSend() cannot be called without first doing
> > > a sessionSetup(). So you either do sessionSetup() and then send() but a
> > > subsequent sessionSetup() will just return because the session isalready
> > > "setup".
> > there may be actually multiple sessions on one transport.
> 
> True. I didn't think about that.
----

> > disconnecting on  errors in other places then the transport thread is
> > bogous, because on a busy transport this will generate a queue of threads
> > waiting on the transport monitor trying to call disconnect. This would be
> > no problem alone, but there may be also threads waiting on the transport
> > monitor for sessionSetup or connect, so you get something like this
> >
> > disconnect
> > connect
> > disconnect
> > disconnect
> > connect
> > disconnect
> > connect
> >
> > untill all threads with sending errors are done.
>
> Correct. Each thread is serialized so Thread1 tries and fails to connect
> and calls disconnect. Then Thread2 wakes up tries and fails to connent
> and calls disconnect. Then Thread3 and so on ad infinitum. AFAIC this
> is the correct behavior because only one thread can be allowed to work
> on changing the connection state of the transport.
Not exactly what I meant. The first thread gets an IOException in doSend 
because the transport socket had been closed. It calls disconnect and exits.
The second thread now gets the monitor, the transport connects and he tries to 
send. The third thread now gets the monitor and disconnects the transport, 
probably causing the second thread to fail. And so on. Is this sane 
behavior ?

> The "test" branch changed this behavior slightly. When Thread1 gets an
> error it calls disconnect and sets the state to be in "error". Thread2
> then sees the state is "error", resets the state to 0 (not connected)
> and then throws a "transport in error" exception. Thread3 then tries and
> fails to connect and calls disconnect, etc. So the number of cycles is
> effectively cut in half.
>
> So what do you *want* to happen?
report errors in the sending threads and close down the transport from the 
reading transport thread (this is what my transport changes try to 
implement).

> > here actually is an example using JCIFS-1.2.5 with my Crawler test  after
> > a single server-side transport close (actually not simple to trigger
> > without generating deadlocks, even when using only one session). I
> > inserted a log message in connect and disconnect.
> >
> > disconnecting
> > smb://194.39.184.7/test/test/CopyTo/DOMEditor/CVS/DOMEditor/domeditor/vie
> >w/: jcifs.smb.SmbException:
> > java.net.SocketException: Connection reset
> >         at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at
> > java.net.SocketOutputStream.write(SocketOutputStream.java:136) at
> > jcifs.smb.SmbTransport.doSend(SmbTransport.java:398)
> >         at jcifs.util.transport.Transport.sendrecv(Transport.java:68)
> >         at jcifs.smb.SmbTransport.send(SmbTransport.java:580)
>
> <snip>
>
> > connecting
> > disconnecting
> > connecting
> >
> > This was the behavior I learned when anlyzing the problems in jcifs 1.1
> > that never worked stable for the app i try to support. In jcifs-1.2 this
> > was changed to disconnecting only in the transport thread, but
> > reintroduced in 1.2.5 order to implement error handling when server drops
> > open transport.
>
> This seems like the correct behavior to me. I mean you got an exception
> because the server closed the transport. Or am I misunderstanding you?
see above.

> > Actually it may not be needed anymore even without my changes, since the
> > transport now handles peekKey()==null (have to check this).
> > But the cancellation of active request on hard disconnect is something
> > definitly needed for correct error reporting (even with locking done
> > right). My transport changes may be discussable, but are in no way
> > irrelevant.
> >
> > > [1] Note that lowering jcifs.smb.client.ssnLimit would probably suffice
> > >     to resolve this issue. It was not really anticipated that the
> > > number of outstanding requests on a transport at one time would be more
> > > than say 2 or 3.
> >
> > Not really an option if you may have hundreds of sessions per server.
>
> Why not? I don't think you understand what ssnLimit does. If ssnLimit is
> say 250 (the default) that means that after the 250th session, instead
> of reusing the existing transport, a new one is created which itself
> can then support 250 sessions. Meaning if you set ssnLimit to 25 and you
> have 500 sessions, they will be spread across 500 / 25 = 20 transports
> cutting your response_map.notifyAll() overhead by a factor of 20. That
> sounds like an option worth trying to me.
Like memory threads are a limited resource. Also there may be more than one 
thread using a session.
Anyway there are some issues with ssnLimit not beeing effective when sessions 
reconnect and not being disassociated from transport.

> The SmbThreadTest Results
>
> I have spent some time looking at your SmbThreadTest.java example. I
> did find one issue. It turns out that the SmbTransport.matches() routine
> wasn't quite right. It needs to look like this:
>
>   boolean matches( UniAddress address, int port,
>            InetAddress localAddr, int localPort ) {
>     return address.equals( this.address ) &&
>           (port == 0 || port == this.port ||
>               /* port 139 is ok if 445 was requested */
>               (port == 445 && this.port == 139)) &&
>           (localAddr == this.localAddr ||
>
> because if the transport tries to connect and falls back to port 139 then
> the ports will not match and SmbTransport.getSmbTransport() will end
> up creating a new transport every time. This will result in many many
> sockets being opened to the target machine which will eventually choke
> and the client will throw exceptions like SocketeException: socket closed.
if you had used not only parts of my change, this would have worked correctly.  
There is still a problem if the netbios port has to be used, and the 
transport gets an exception on first connect. Then port 445 is used and the 
sessions with workstation limitations fail until the transport is closed.

> Out of Memory Condition
>
> The only way I could generate exceptions from your SmbThreadTest program
> was if the VM ran out of memory at which point it's fairly expected that
> nothing is going to work right. Actually when it comes to exhausting
> memory, this test is rather evil because it creates greater and greater
> numbers of crawlers with a maxDepth of 1000. And because every call to
> traverse() creates an SmbFile[] array and then proceeds to call traverse()
> recursively, you have potentially thousands of these arrays resulting
> in HUNDREDS OF THOUSANDS of SmbFile objects (try -Xrunhprof:heaps=sites
> and look at the resulting java.hprof.txt). The result is that the whole
> thing just runs out of memory (rather quickly actually) and you start to
> get all sorts of exceptions (usually starting with "timedout waiting for
> response" TransportExceptions because the client doesn't have the memory
> to make progress). In the tests that I ran, once the number of threads
> reached about 245 I would get a few OutOfMemoryErrors and then a large
> number of exceptions because all the threads start to fail. I don't think
> there's much I can do about this if anything. Are there some different
> values for sleep periods and such that I should try for this test?
Try first a smaller amount of files. There is a factor at session_time that 
controlls how many sessions run in parallel. Also there is a sleep in 
traverse that can be changed to reduce cpu usage.

> Managing the SmbTransport.sessions List
>
> As for the outstanding issue of when and how to add and remove sessions
> from the SmbTransport.sessions list this is still not handled correctly
> even in the "test" branch. I didn't realize that the SmbTree.session
> member is never released once set. This means that (as you originally
> discovered) if we remove the session from the SmbTransport.sessions list
> (such as when disconnect logs off all sessions) it will not be re-added
> when the session is re-"setup". This is not catastrophic but extra
> redundant sessions may be created if other threads request a session with
> the same credentials.
this _is_ catastrophic if the transport closes and reconnects, having 
non-working open sessions.

> For now I have simply stoped removing the sessions from the sessions
> list. That should keep things normalized, but the downside to this is that
> the sessions will never be released which is a waste of memory. I think
this makes ssnLimit even worse. You also could have used my patch.

> the correct solution here might be to dynamically aquire the session in
> SmbTree.java like we do with the transport() in SmbSession but I'm not
> certain of the implications of doing that.
I think this will only help gradually. Anyway the number of trees per session 
is quite limited.

> jcifs-1.2.6 is in the download area. It's based on 1.2.5 with the minor
> fixes mentioned. If you want to suggest changes I would appreciate it
> if you could work from this package.

Lars