[jcifs] Re: jcifs-1.2.4 stability fixes

Lars heete hel at admin.de
Wed Oct 5 07:56:36 GMT 2005


Hello,

> Lars heete <hel at admin.de> wrote:
> > >     239 +    public boolean disconnect( boolean hard ) {
> > >     240 +        if (state == STATE_OPEN)
> > >     241 +            changeState(STATE_CLOSING);
> > >     242 +
> > >     243 +        synchronized(this) {
> > >
> > > This opens up all sorts of possible synchronization flaws. If you're
> > > mess with the state without the lock you might as well just flip the
> > > state to error just before disconnect() in Transport.loop() and then
> > > bail out in sendrecv() if the transport is in error.
> >
> > In my changed transport version actually the state changes are protected
> > by the response_map mutex (look at changeState), which gets released when
> > client threads are waiting for responses.
>
> But between the time you inspect the state and call changeState a context
> switch could occur resulting on changeState(STATE_CLOSING) being called
> multiple times.
but only when disconnect is called from other threads than the transport 
thread, which my changes tries to avoid. If it could happen, it would not be 
fatal. Also it can be easily surrounded by a synchronized(response_map).
> > The problem actually is that disconnect
> > may block the transport thread. For the hard error case it should be ok
> > using only the respose_map lock for state changes. The soft disconnect
> > (expiration) case  ideally should be moved to a different "cleanup"
> > thread, where blocking isn't that fatal.
>
> Meaning while the transport thread is busy calling disconnect it cannot
> process any responses. In this case yes a separate thread should be used if
> the transport thread is to call disconnect(). However I'm going to avoid
> this for now because it so happends that we do not wait for TreeDisconnect
> or LogoffAndX responses so it doesn't really help us.
I meant disconnect waiting on the transport monitor.

> > > > The current SmbTransport actually has a deadlock in the paths
> > > > doSend->disconnect (first rmap lock then transport lock) and
> > > > sessionSetup->send (first transport lock then rmap lock).
> > > > This is fixed doing the error handling only in the transport thread.
> > >
> > > I don't know about this. I think logic actually prevents this from
> > > ever happening because doSend() cannot be called without first doing
> > > a sessionSetup(). So you either do sessionSetup() and then send() but a
> > > subsequent sessionSetup() will just return because the session is
> > > already "setup".
> >
> > there may be actually multiple sessions on one transport.
>
> True. I didn't think about that.
>
> Ok, I've made some significant changes. The technique of locking
> the transport while setting up/logging off/mounting/unmounting is too
> coarse. It was very simple to implement but because Transport.disconnect()
> and other related routines can call back into the upper layers while
> holding the transport lock it is too deadlock prone. This is really the
> source of all issues you've found.
>
> So I've change locking throughout in a fundamental
> way. SmbSession and SmbTree now have explicit state machine
> driven locking for treeConnect/disconnect, sessionSetup/logoff like
> Transport.connect/disconnect do. But all of these methods now *unlock*
> the lock before calling the body of their routine [1]. Meaning all of
> these methods now are roughly as follows:
>
>     void foo() {
>         synchronized( transport ) {
>             look at state and allow only one thread to pass
>             all other threads either return or wait for the state to change
>         }
>
>         do real work here without holding the lock
>
>         synchronized( transport ) {
>             update state with success or failure and notifyAll waiters
>         }
>     }
>
> Now, beware, this new model is much looser. There is much more potential
> for concurrency on each transport. Before, we knew that because the
> transport was locked the whole time during sessionSetup/logoff etc that
> the number of possible code paths was very limited. This is not true
> of the new model and therefore it is more complicated.
>
> I've tested this as best I can in my limited environment. Can you try
> it before I unleash this on everyone else? Your webcrawler setup seems
> to be good at stressing the code.
Sounds very complicated. Actually on transport close from server side it ends 
up having to threads waiting for each other:

"TransportThread1" daemon prio=1 tid=0x08f60c40 nid=0xb10 in Object.wait() 
[aafbc000..aafbc238]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xabd3df28> (a jcifs.smb.SmbTransport)
        at java.lang.Object.wait(Object.java:429)
        at jcifs.util.transport.Transport.disconnect(Transport.java:252)
        - locked <0xabd3df28> (a jcifs.smb.SmbTransport)
        at jcifs.util.transport.Transport.loop(Transport.java:145)
        at jcifs.util.transport.Transport.run(Transport.java:317)
        at java.lang.Thread.run(Thread.java:534)

"Thread-8" prio=1 tid=0x0903e9f8 nid=0xb10 in Object.wait() 
[aacb5000..aacb6238]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xabd3df28> (a jcifs.smb.SmbTransport)
        at java.lang.Object.wait(Object.java:429)
        at jcifs.smb.SmbSession.logoff(SmbSession.java:340)
        - locked <0xabd3df28> (a jcifs.smb.SmbTransport)
        at jcifs.smb.SmbTransport.doDisconnect(SmbTransport.java:320)
        at jcifs.util.transport.Transport.disconnect(Transport.java:271)
        at jcifs.util.transport.Transport.sendrecv(Transport.java:97)
        at jcifs.smb.SmbTransport.send(SmbTransport.java:585)
        at jcifs.smb.SmbSession.sessionSetup(SmbSession.java:302)
        at jcifs.smb.SmbSession.send(SmbSession.java:236)
        at jcifs.smb.SmbTree.treeConnect(SmbTree.java:174)
        at jcifs.smb.SmbFile.connect(SmbFile.java:792)
        at jcifs.smb.SmbFile.connect0(SmbFile.java:762)
        at jcifs.smb.SmbFile.exists(SmbFile.java:1248)
        at SmbThreadTest.traverse(SmbThreadTest.java:41)
        at SmbThreadTest.run(SmbThreadTest.java:97)

Attached is the test to trigger these problems (SmbThreadTest).
To simulate many user sessions the session-match.diff patch is needed and the
"jcifs.smb.client.debug.compareSessionAuth" property has to be set to "false".
The test needs to be run on a share with a sufficient large directory 
structure. If you disconnect transport on the server side several times there 
will be no new sessions if the bug is triggered.

There is a second test (SmbSessionListTest) to document the session 
association problems. The fix for the NULL-transport in jcifs-1.2.5 actually 
made the session-list corruption problem worse. If a session reconnects it 
does not get into this list and stays open when the transport closes, causing 
lots of NT_STATUS_UNSUCCESSFULL (0xC0000001) errors .
To report the problem there is a second patch (session-list-check). This test 
needs to be run like the previous test. If you disconnect the network cable 
(or disable the interface in vmware) for some seconds with this test running, 
you should be able to reproduce this.
 
> The code is in the download area labeled jcifs-1.2.6test.
>
> Mike
>
> [1] There is one exception that I know of so far that I'm not sure how
> to fix. That is in SmbTransport.getSmbSession() which is synchronized
> but SmbSession.logoff() is called inside a loop. The loop needs to be
> protected by the lock so unlocking within the loop isn't possible. To
> fix this it would be necessary to remove the sessions from the list,
> unlock, and then logoff the sessions.
Ideally this should be also migrated to the cleanup thread.

Lars
-------------- next part --------------
A non-text attachment was scrubbed...
Name: session-match.diff
Type: text/x-diff
Size: 979 bytes
Desc: not available
Url : http://lists.samba.org/archive/jcifs/attachments/20051005/08dde190/session-match-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SmbThreadTest.java
Type: text/x-java
Size: 3550 bytes
Desc: not available
Url : http://lists.samba.org/archive/jcifs/attachments/20051005/08dde190/SmbThreadTest-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SmbSessionListTest.java
Type: text/x-java
Size: 2611 bytes
Desc: not available
Url : http://lists.samba.org/archive/jcifs/attachments/20051005/08dde190/SmbSessionListTest-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: session-list-check.diff
Type: text/x-diff
Size: 482 bytes
Desc: not available
Url : http://lists.samba.org/archive/jcifs/attachments/20051005/08dde190/session-list-check-0001.bin


More information about the jcifs mailing list