[jcifs] Deadlock while cleaning up old sessions in transport

Tue Aug 14 16:20:51 GMT 2007

On Tue, 14 Aug 2007 14:30:55 +0200 (CEST)
"Felix Schumacher" <felix.schumacher at internetallee.de> wrote:

> Am Mo, 13.08.2007, 21:07, schrieb Michael B Allen:
> > On Mon, 13 Aug 2007 17:17:52 +0200 (CEST)
> > "Felix Schumacher" <felix.schumacher at internetallee.de> wrote:
> >
> >> > Like I said the other day, the only way to fix this problem is to
> >> create
> >> > a custom lock class and replace all of the synchronized calls so that
> >> one
> >> > lock can be used and the lock can be *unlocked* deep in hte
> >> call-stack.
> >> So given an implementation of concurrent.ReentrantLock (with which one
> >> could start to have a known working implementation), how would you
> >> decide
> >> when to unlock the transport and where?
> >
> > Hi Felix,
> >
> > If you have a lock class that can be unlocked then the solution is simple
> > because you can have one lock for the entire VM. If there's only one
> > lock then a deadlock is theoretically impossible.
> If you only want to have ONE lock for the entire VM, it should be enough
> to synchronize on a static Object Variable (which should exist only once
> in the VM). But I had the impression, it would be nicer to lock every
> instance of Transport seperatly. When we use only one Object to
> synchronize there, it should not be possible to deadlock either. Or do we
> only have one instance of Transport? In that case we effectivly have the
> wanted situation already.

Hi Felix,

I don't see a problem with doing that. The only possible issue is that
you might not actually have access to the right transport object at the
point you need it so having a static member assures that you'll always
have access to the lock no matter where you are. But if you find that
you can do it it probably would be a little better because it ensures
that any mistake in the code (blocking with the lock) will not stall
the whole client.

> > Meaning, create a static instance of ReentrantLock associated with
> > jcifs.util.transport.Transport and add static lock/unlock methods that
> > lock/unlock that ReentrantLock. Now, eliminate all synchronized calls
> > and replace them with Transport.lock() / Transport.unlock().
> >
> > That will solve the deadlock but there is a pitfall. You must never ever
> > ever EVER block while that lock is locked. For example if the transport
> > tries to read() from a socket with the lock locked the entire client
> > will lockup (but not deadlock) waiting for that I/O. The end result will
> > be that performance will go subterranean.
> The performance will be bad having one lock only, since every write to all
> Transport instances would be queued.

No, that's not true. Like I said you can never block with the lock. So
before you write you must unlock the lock. You must unlock the lock
whenever you open files, read / write, create directories, etc.  And when
you're not sure if the code can block (e.g. calling into some mystery
library) then you should probably unlock there as well. Basically I
you should have a handful of instances that look something like:

  Transport.unlock()
  socket.write(buf, index, len);
  Transport.lock();

> I thought you wanted to have a thread to disconnect old sessions, which
> would lock the corresponding transport instance and wait for a condition
> to be true (time, transport has too many sessions, ...).

I suspect you're talking about the "transport thread". It's purpose is
to read from the socket so that callers do not have to commit to a read
which could block for longer than the caller wants to wait. But if an
error occurs it calls disconnect(). That is one of the threads in the
deadlock you're seeing. But you can't eliminate that call because the
application thread may be long gone.

> While this thread
> waits it would release it's lock, so other threads could do their work.

Right.

> When a working thread or a supervising thread signals that the awaited
> condition is true, the sleeping cleanup thread would awake, reclaim it's
> lock and do the dirty work, while keeping the other jobs at bay.

I'm not sure I understand this. The transport thread (there's no "cleanup
thread") should only call disconnect if there's some kind of error.

> If the design of the cleanup job stays like it is today, I see no real win
> in using a lock, but to loose the compiler checked synchronization.

The whole point of using ReentrantLock is to completely get rid of the
synchronized() statements. The problem with synchronized is that you
cannot unlock the lock deep in the call stack without calling wait on
the object on which you locked. That is a very coarse mechanism and not
suitable for our purposes. In fact IIRC there is no way to do it correctly
using synchronized statements - there will *always* be a hole. It's easy
to patch the code so that it doesn't occur in your test scenario but if
you stare at the code long enough you'll see that there is in fact a hole.

Mike

> > If you actually do this and you know what you're doing and it works,
> > please send a patch and I'll put it in the patches directory.
> >
> > Mike
> >
> > --
> > Michael B Allen
> > PHP Active Directory Kerberos SSO
> > http://www.ioplex.com/
> >
> 
> 

-- 
Michael B Allen
PHP Active Directory Kerberos SSO
http://www.ioplex.com/