[jcifs] Re: jcifs-1.2.4 stability fixes

Michael B Allen mba2000 at ioplex.com
Thu Sep 29 19:45:30 GMT 2005


On Thu, 29 Sep 2005 17:10:30 +0200
Lars heete <hel at admin.de> wrote:

> Hello,
> 
> a customer of me had severe problems using jcifs (1.1 and 1.2) in a large 
> intranet application. There were lots of deadlocks/livelocks especially in 
> this environment, where transport disconnects seem to be quite frequent.
> I worked several days to fix these problems. The attached patches against
> jcifs-1.2.4 
> 
> The first three patches are simple fixes
> 
>  jcifs-buffer-check.diff - check buffer bounds to avoid exception in transport

Actually I think an ArrayIndexOutOfBoundsException is appropriate here
since the user supplies the length.

>  jcifs-handle_close.diff - handle close on negotiate

Ok. This one's good. But I think we might as well just throw an Exception
out of peekKey.

Also, did you actually encouter instances where the new socket was closed
before the negotiate could take place? If so, do you have a stack trace?

>  jcifs-buffer.diff  - use one buffer per transport instance (performance)

No. These buffers are 64K. If you have multiple transports it's a serious
drain on memory so the associated page faults will far outstrip any
performance benifits of giving each transport it's own buffer.

> the next three are session/tree fixes, and depend on each other due patch 
> context. 
> 
>  jcifs-session-expiration.diff - modify session expiration to avoid session 		
> 					  logoff in busy sessions. 

Yup. We definately want to keep the session going if there's any
activity. Good catch!

>  jcifs-session-assoc.diff - fix session->transport association to avoid
> 				     connects on NULL_TRANSPORT and active
> 				     sessions that aren't registered in transport.sessions.

I don't understand this. I had to think for a while to recall what
NULL_TRANSPORT does but I believe it's simply to give logoff() something
to synchronize on in the event that it's called when the session isn't
setup. There should be no way for the NULL_TRANSPORT to actually be used.

But that doesn't look like what your patch is really targeting. It looks
like it trys to resolve a conflict between setting up the session and
logging it off at the same time.

Are you basing your patch on real observations or are you simple reasoning
about the code? Can you describe the sequence of events that can occur
that cause an unexpected race? I would think it's not possible since
both sessionSetup and logoff synchronize on the underlying transport.

>  jcifs-tcon_hostname.diff - reuse available host-name to avoid reconnects

SmbTransport.tconHostName is set in SmbTransport.doConnect() which
is called by Transport.connect() which is called before the two
instances where you inserted getHostName which calls connect(). So it
can never be null at that point. Perhaps you assumed connect() would
cause some protocol communication. This is not true. If you look at
jcifs.util.transport.Transport.connect() you will see it is a state
machine that only acts if the transport is in state 0 (not connected).

What was the original symptom that lead you to look at this? Are you
seeing tconHostName null?

Also, what is this:

-synchronized( session.transport() ) {
+synchronized( session.transport() ) { /*FIXME: not correct if session disconnects and
+                                        multiple transports avaliable */

Again, is this based on observed behavior or are you simply drawing on
your understanding of the code. Submitting patches is good because it
helps me understand more precisely where the issues are but you have
to explain more about what was observed that lead you to conclude the
change was necessary.

> The last two are transport fixes:
> 
>  jcifs-transport_setup.diff - 
>      modify transport setup to allow timeouts on socket.connect,

Connect timeouts are handled in Transport.connect():166. Your technique
is superior but it requires Java 1.4 so I'd rather hold off on that for
a while.

>      if netbios.hostname is specified force to netbios port 139

Ok, but we don't need the changes in NbtAddress. They can be isolated
to SmbTransport.java.

>      fix session-transport matching if transport.port != DEFAULT_PORT

Good one. I love changes that make the code simpler (and correct :-).

But does this depend on Handler.getDefaultPort returning 0 in some way? I
don't think we should do that because that API contract is not defined
by us. Port 445 is conceptually the default port. 139 is more like a
"fallback" port.

Also, why did you widen the scope of the catch in doConnect from
ConnectException to IOException? Did you encounter an exception that
warranted retrying on a different port? If so, do you have a stack trace?

>  jcifs-threading.diff - 
>      the biggest one ... there is a big locking problem between 
>      treeConnect/sessionSetup/sessionExpiration  and transport closedown via
>      disconnect. Both take the transport-monitor, and if the transport
>      socket-connection closes with sessions at a high frequency, 
>      wrong thread may wait on transport.response_map with transport monitor
>      hold, so transport.disconnect will not run, and the transport is unusable
>      for a long time (at least REQUEST_TIMEOUT)
>      The changes try adress this problem by signalling not only responses but 
>      also transport state changes via transport.response_map, and setting 
>      transport.state CLOSING to indicate transport closedown before
>      disconnect, so any waiting thread can leave the transport monitor.

I'll have a look at this one separately. I need a few cokes to think
about transport locking.

Thanks,
Mike


More information about the jcifs mailing list