[jcifs] Re: jcifs-1.2.4 stability fixes

Michael B Allen mba2000 at ioplex.com
Wed Oct 5 23:00:08 GMT 2005


On Wed, 5 Oct 2005 09:56:36 +0200
Lars heete <hel at admin.de> wrote:

> > I've tested this as best I can in my limited environment. Can you try
> > it before I unleash this on everyone else? Your webcrawler setup seems
> > to be good at stressing the code.
> Sounds very complicated. Actually on transport close from server side it ends 
> up having to threads waiting for each other:
> 
> "TransportThread1" daemon prio=1 tid=0x08f60c40 nid=0xb10 in Object.wait() 
> [aafbc000..aafbc238]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xabd3df28> (a jcifs.smb.SmbTransport)
>         at java.lang.Object.wait(Object.java:429)
>         at jcifs.util.transport.Transport.disconnect(Transport.java:252)
>         - locked <0xabd3df28> (a jcifs.smb.SmbTransport)
>         at jcifs.util.transport.Transport.loop(Transport.java:145)
>         at jcifs.util.transport.Transport.run(Transport.java:317)
>         at java.lang.Thread.run(Thread.java:534)
> 
> "Thread-8" prio=1 tid=0x0903e9f8 nid=0xb10 in Object.wait() 
> [aacb5000..aacb6238]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xabd3df28> (a jcifs.smb.SmbTransport)
>         at java.lang.Object.wait(Object.java:429)
>         at jcifs.smb.SmbSession.logoff(SmbSession.java:340)
>         - locked <0xabd3df28> (a jcifs.smb.SmbTransport)
>         at jcifs.smb.SmbTransport.doDisconnect(SmbTransport.java:320)
>         at jcifs.util.transport.Transport.disconnect(Transport.java:271)
>         at jcifs.util.transport.Transport.sendrecv(Transport.java:97)
>         at jcifs.smb.SmbTransport.send(SmbTransport.java:585)
>         at jcifs.smb.SmbSession.sessionSetup(SmbSession.java:302)
>         at jcifs.smb.SmbSession.send(SmbSession.java:236)
>         at jcifs.smb.SmbTree.treeConnect(SmbTree.java:174)
>         at jcifs.smb.SmbFile.connect(SmbFile.java:792)
>         at jcifs.smb.SmbFile.connect0(SmbFile.java:762)
>         at jcifs.smb.SmbFile.exists(SmbFile.java:1248)
>         at SmbThreadTest.traverse(SmbThreadTest.java:41)
>         at SmbThreadTest.run(SmbThreadTest.java:97)

Actually the threads are not waiting for each other, Thread-8 is waiting
for itself. It called sessionSetup which set's it's state to 1 (setting
up) and then ends up calling logoff via doDisconnect (presumably because
you pulled the network cable). Because it sees the state of 1 it waits
which of course is futile because it's waiting for itself to complete
the sessionSetup.

This is pretty easy to fix actually. I had seen this in the Transport
state machine and had a solution there but only now realized it
can happen for all objects with explict locking for both setup and
teardown. The solution is to make a note of the thread responsible for
the state transition and then check for at the top of connect/disconnect,
sessionSetup/logoff, and treeConnect/treeDisconnect. See the sentinel
member of Transport, SmbSession, and SmbTree.

Anyway, I think it's fixed. Please try jcifs-1.2.6test2.

> Attached is the test to trigger these problems (SmbThreadTest).
> To simulate many user sessions the session-match.diff patch is needed and the
> "jcifs.smb.client.debug.compareSessionAuth" property has to be set to "false".
> The test needs to be run on a share with a sufficient large directory 
> structure. If you disconnect transport on the server side several times there 
> will be no new sessions if the bug is triggered.

Actually I've been using the examples/CrawlTest.java example. It's pretty
much like your test but I think it might stress things even more because
it sets the soTimeout and responseTimeout just low enough so that sessions
and transports timeout frequently which causes them to be setup and torn
down a lot. I also have a commented clause in SmbTransport.getSmbSession
that does what I think compareSessionAuth does. When you run it, hit
return to add a thread to the crawl.

Needless to say I ran CrawlTest quite a bit and stopped/started Samba
and the client was able to recover ok. I think this release might go
the distance. I'd really like to see how it stands up to your HTTP
crawler though.

> There is a second test (SmbSessionListTest) to document the session 
> association problems. The fix for the NULL-transport in jcifs-1.2.5 actually 
> made the session-list corruption problem worse. If a session reconnects it 
> does not get into this list and stays open when the transport closes, causing 

Ahh! That's a big bug. I've changed SmbSession.sessionSetup/logoff
making them responsible for adding/removing the session from the sessions
list. I think this will fix that.

> lots of NT_STATUS_UNSUCCESSFULL (0xC0000001) errors .
> To report the problem there is a second patch (session-list-check). This test 
> needs to be run like the previous test. If you disconnect the network cable 
> (or disable the interface in vmware) for some seconds with this test running, 
> you should be able to reproduce this.
>  
> > The code is in the download area labeled jcifs-1.2.6test.
> >
> > Mike
> >
> > [1] There is one exception that I know of so far that I'm not sure how
> > to fix. That is in SmbTransport.getSmbSession() which is synchronized
> > but SmbSession.logoff() is called inside a loop. The loop needs to be
> > protected by the lock so unlocking within the loop isn't possible. To
> > fix this it would be necessary to remove the sessions from the list,
> > unlock, and then logoff the sessions.
> Ideally this should be also migrated to the cleanup thread.

I don't think a cleanup thread is going to help much. A thread is a
thread. The only benifit it will have is if the transport needs to
process requests while it's busy "cleaning up" which in our particular
case is not true.

Mike


More information about the jcifs mailing list