[jcifs] Multiple Transport threads with same transportId

Sat Jan 15 12:01:02 MST 2011

Hi Adam,

The code that sets the id looks like this:

    public abstract class Transport implements Runnable {
        static int id = 0;
        String name = "Transport" + id++;

Because the name assignment is not synchronized, it is possible
(albeit unlikely) for the name to be set to the same value if a
context switch occurs after the name is set but before the id is
incremented.

The proper fix would be to move the assignment into a static block and
synchronize on a static object like:

static {
    synchronized (log) {
        name = "Transport" + id++;
    }
}

This seems a little overboard though. It's not clear to me at the
moment how important it is to have a truly unique transport id. It is
really just used for debugging I think.

Mike

-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/

On Thu, Jan 13, 2011 at 1:44 PM, Adam Morgan <adam.morgan at q1labs.com> wrote:
> Hi Mike
>
>
>
> First off, Happy New Year!  Now to business…
>
>
>
> While tracking down an issue on one of our clients’ systems, I discovered
> that multiple SmbTransport(Transport) threads existed with the same
> ‘transportId’.  I think I’m correct when I say this shouldn’t happen, and so
> I’m trying to track down how this could happen.  The snippets from a stack
> dump below.  You will notice that the thread names reflect the changes I
> submitted a few weeks ago.  The second number (added as part of those
> changes) reflects the number of threads that have been created by the
> Transport class instances over the course of a jvm up time.
>
>
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2530" daemon
> prio=10 tid=0x00002aab343fb800 nid=0x1f74 runnable
> [0x000000007466d000..0x000000007466db10]
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State:
> RUNNABLE
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.socketRead0(Native Method)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.read(SocketInputStream.java:129)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.readn(Transport.java:31)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.smb.SmbTransport.doRecv(SmbTransport.java:499)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     - locked
> <0x00002aaac7470140> (a [B)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.loop(Transport.java:108)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     - locked
> <0x00002aaac84ab020> (a jcifs.smb.SmbTransport)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.run(Transport.java:261)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.lang.Thread.run(Thread.java:619)
>
>
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2257" daemon
> prio=10 tid=0x00002aab26dbec00 nid=0x7d38 runnable
> [0x00002aab27276000..0x00002aab27276d90]
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State:
> RUNNABLE
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.socketRead0(Native Method)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.read(SocketInputStream.java:129)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.read(SocketInputStream.java:182)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.smb.SmbTransport.peekKey(SmbTransport.java:421)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.loop(Transport.java:98)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.run(Transport.java:261)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.lang.Thread.run(Thread.java:619)
>
>
>
> The system we have seen this on was regularly hitting the responseTimeout
> threshold due to network load spikes, but would recover in time as the
> network load dropped.  I’m able to reproduce the response timeouts by
> inducing delay on packet delivery from the smb server (using the linux tc
> command), but so far have not seen multiple threads with the same
> transportId value.
>
>
>
> My best guess at this point is that the first thread is being orphaned by a
> ‘thread = null;’ call while it is blocked on a socketRead0() call.  If the
> thread were blocking on the socket read, then potentially it would never
> return and remain until jvm shutdown.  Perhaps a thread.interrupt() call
> should precede each ‘thread = null;’ execution?  Granted, without being to
> reproduce the issue I’m just postulating but I was interested in your
> thoughts/comments.