[jcifs] Multiple Transport threads with same transportId

Mon Jan 17 08:48:26 MST 2011

Hi Mike

I guess I'm just not convinced this is the reason it is happening, as I saw the situation multiple times.  There would have been about 80 transport threads on the box that was exhibiting the behaviour, and they'd all have been getting initialized around the same time.  Still, a long shot that they'd context switch at the exact same place, multiple times over a few jvm restarts.

That being said, you may very well be correct.  Adding the synchronized block would allow us to know this for sure.  IMHO it would be worth inclusion in next release.  

Cheers
Adam

-----Original Message-----
From: Michael B Allen [mailto:ioplex at gmail.com] 
Sent: Saturday, January 15, 2011 3:01 PM
To: Adam Morgan
Cc: jcifs at lists.samba.org
Subject: Re: [jcifs] Multiple Transport threads with same transportId

Hi Adam,

The code that sets the id looks like this:

    public abstract class Transport implements Runnable {
        static int id = 0;
        String name = "Transport" + id++;

Because the name assignment is not synchronized, it is possible
(albeit unlikely) for the name to be set to the same value if a
context switch occurs after the name is set but before the id is
incremented.

The proper fix would be to move the assignment into a static block and
synchronize on a static object like:

static {
    synchronized (log) {
        name = "Transport" + id++;
    }
}

This seems a little overboard though. It's not clear to me at the
moment how important it is to have a truly unique transport id. It is
really just used for debugging I think.

Mike

-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/

On Thu, Jan 13, 2011 at 1:44 PM, Adam Morgan <adam.morgan at q1labs.com> wrote:
> Hi Mike
>
>
>
> First off, Happy New Year!  Now to business.
>
>
>
> While tracking down an issue on one of our clients' systems, I discovered
> that multiple SmbTransport(Transport) threads existed with the same
> 'transportId'.  I think I'm correct when I say this shouldn't happen, and so
> I'm trying to track down how this could happen.  The snippets from a stack
> dump below.  You will notice that the thread names reflect the changes I
> submitted a few weeks ago.  The second number (added as part of those
> changes) reflects the number of threads that have been created by the
> Transport class instances over the course of a jvm up time.
>
>
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2530" daemon
> prio=10 tid=0x00002aab343fb800 nid=0x1f74 runnable
> [0x000000007466d000..0x000000007466db10]
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State:
> RUNNABLE
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.socketRead0(Native Method)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.read(SocketInputStream.java:129)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.readn(Transport.java:31)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.smb.SmbTransport.doRecv(SmbTransport.java:499)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     - locked
> <0x00002aaac7470140> (a [B)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.loop(Transport.java:108)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     - locked
> <0x00002aaac84ab020> (a jcifs.smb.SmbTransport)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.run(Transport.java:261)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.lang.Thread.run(Thread.java:619)
>
>
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2257" daemon
> prio=10 tid=0x00002aab26dbec00 nid=0x7d38 runnable
> [0x00002aab27276000..0x00002aab27276d90]
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State:
> RUNNABLE
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.socketRead0(Native Method)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.read(SocketInputStream.java:129)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.net.SocketInputStream.read(SocketInputStream.java:182)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.smb.SmbTransport.peekKey(SmbTransport.java:421)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.loop(Transport.java:98)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> jcifs.util.transport.Transport.run(Transport.java:261)
>
> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
> java.lang.Thread.run(Thread.java:619)
>
>
>
> The system we have seen this on was regularly hitting the responseTimeout
> threshold due to network load spikes, but would recover in time as the
> network load dropped.  I'm able to reproduce the response timeouts by
> inducing delay on packet delivery from the smb server (using the linux tc
> command), but so far have not seen multiple threads with the same
> transportId value.
>
>
>
> My best guess at this point is that the first thread is being orphaned by a
> 'thread = null;' call while it is blocked on a socketRead0() call.  If the
> thread were blocking on the socket read, then potentially it would never
> return and remain until jvm shutdown.  Perhaps a thread.interrupt() call
> should precede each 'thread = null;' execution?  Granted, without being to
> reproduce the issue I'm just postulating but I was interested in your
> thoughts/comments.