[jcifs] Multiple Transport threads with same transportId

Tue Jan 18 07:44:41 MST 2011

Ok we're on the same page now :)  And yes, this thread name condition occurs with the "InetSocketAddress" patch, however, the situation that req'd the InetSocketAddress patch is not what we're seeing here... that issue was around the initial socket creation, whereas this issue exists AFTER the connect, when the transport loop is set up and waiting for a response (as u say, on the peekKey() call).

You're right in that if the extra thread is able to continue, the while() condition of (thread == Thread.currentThread()) will kick it out.  However, with the thread being blocked on the read() call, it will never return and is why I suggest the interrupt() call.  

I am familiar with reluctance to use interrupt(), but I believe it is required here, since as per javadoc, "This method blocks until input data is available, end of file is detected, or an exception is thrown."  If you can tell me how that thread would kick out otherwise then I'd agree that interrupt() would not be necessary, but in this case there's no other way for the blocking read() call to return (under the assumption that no more requests are going to be sent and thus no responses sent back).

Now, it would be a benign situation, having this extra transport thread hanging around, if not for the synchronization strategy being used.  There are two object monitors being used around read/writes, one on the static bytebuffer BUF, and one on the SmbTransport object itself.  Since the first object monitor is static, ALL transport threads are tied to this synchronization, and since in this case there are two transport threads linked to a single SmbTransport object, there is potential for deadlocking.  We actually have seen a deadlock situation on a customer site, and I believe I fixed it by changing the static BUF to non-static, but I've yet to confirm this fix so have not submitted it to you.  The trade off is that 65K is dedicated to each transport thread, but it was the simplest change to make and not refactor much code.  I'm thinking a buffer pool would be a better solution but we have not gone ahead with it just yet.  I'll put all the information about this deadlock situation into another mailing list thread sometime today. 

Comments/Thoughts?  

-----Original Message-----
From: Michael B Allen [mailto:ioplex at gmail.com] 
Sent: Monday, January 17, 2011 11:24 PM
To: Adam Morgan
Cc: jcifs at lists.samba.org
Subject: Re: [jcifs] Multiple Transport threads with same transportId

On Mon, Jan 17, 2011 at 1:16 PM, Adam Morgan <adam.morgan at q1labs.com> wrote:
> I need to clear up a little confusion here.  I'm not saying there are two instances of SmbTransport object with the same id, rather that there are two Threads running with the same id.
>
> If you check the patch I sent you late last year, you will see that I've changed the code that names the thread so that it now has a format of:
>
> jcifs-transport<transport-instance-count>-<thread-count>
>
> so, when a new SmbTransport object is instantiated, the static 'transport-instance-count' value is incremented (as it always was).  However, now when connect() is called on Transport, it will increment the static thread-count when instantiating the Thread object.
>
> So, based on this, I see potential for multiple Threads being created with the same 'transport-instance-count' value.  Especially since 'thread = null;' is done in several places without a respective 'thread.interrupt();' to kick existing threads out of blocking reads etc as per my initial email in this thread.
>
> Does that make it any clearer?

Hi Adam,

Yes! It does make more sense that multiple threads could be created
with the same transport id. But the extra threads should not
ultimately do any work. As soon as doConnect returns, Transport.run
should see that run_thread != thread and it will exit.

Note that I do not think thread.interrupt() should every be used. At
least not internally within the library.

I see your InetSocketAddress patch on the TODO list. Does the
duplicate thread name condition occur with that patch?

I assume the Threads with the duplicate IDs are just waiting for
connect to timeout (your thread dump fragment shows the one thread
sitting on peekKey.

Mike

-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/

> -----Original Message-----
> From: Michael B Allen [mailto:ioplex at gmail.com]
> Sent: Monday, January 17, 2011 1:55 PM
> To: Adam Morgan
> Cc: jcifs at lists.samba.org
> Subject: Re: [jcifs] Multiple Transport threads with same transportId
>
> Well again, the 3 lines of code used to set the transport id speaks
> for itself. I don't see how it's theoretically possible for duplicate
> IDs to occur other than the two cases described - context switching in
> the middle of the transport id assignment or transports loaded in
> different ClassLoaders. It has to be one of those two cases. You may
> just not realize it (for example Servlet containers frequently use
> different ClassLoaders for things).
>
> Mike
>
> On Mon, Jan 17, 2011 at 12:47 PM, Adam Morgan <adam.morgan at q1labs.com> wrote:
>> Mike
>>
>> No, I'm not using any custom classloading... just using the new operator.
>>
>>
>> -----Original Message-----
>> From: Michael B Allen [mailto:ioplex at gmail.com]
>> Sent: Monday, January 17, 2011 1:45 PM
>> To: Adam Morgan
>> Cc: jcifs at lists.samba.org
>> Subject: Re: [jcifs] Multiple Transport threads with same transportId
>>
>> Hi Adam,
>>
>> Are you using multiple ClassLoaders? If yes, that could certainly
>> explain the behavior. Otherwise, the code is very simple and therefore
>> I think it speaks for itself.
>>
>> Mike
>>
>> On Mon, Jan 17, 2011 at 10:48 AM, Adam Morgan <adam.morgan at q1labs.com> wrote:
>>> Hi Mike
>>>
>>> I guess I'm just not convinced this is the reason it is happening, as I saw the situation multiple times.  There would have been about 80 transport threads on the box that was exhibiting the behaviour, and they'd all have been getting initialized around the same time.  Still, a long shot that they'd context switch at the exact same place, multiple times over a few jvm restarts.
>>>
>>> That being said, you may very well be correct.  Adding the synchronized block would allow us to know this for sure.  IMHO it would be worth inclusion in next release.
>>>
>>> Cheers
>>> Adam
>>>
>>> -----Original Message-----
>>> From: Michael B Allen [mailto:ioplex at gmail.com]
>>> Sent: Saturday, January 15, 2011 3:01 PM
>>> To: Adam Morgan
>>> Cc: jcifs at lists.samba.org
>>> Subject: Re: [jcifs] Multiple Transport threads with same transportId
>>>
>>> Hi Adam,
>>>
>>> The code that sets the id looks like this:
>>>
>>>    public abstract class Transport implements Runnable {
>>>        static int id = 0;
>>>        String name = "Transport" + id++;
>>>
>>> Because the name assignment is not synchronized, it is possible
>>> (albeit unlikely) for the name to be set to the same value if a
>>> context switch occurs after the name is set but before the id is
>>> incremented.
>>>
>>> The proper fix would be to move the assignment into a static block and
>>> synchronize on a static object like:
>>>
>>> static {
>>>    synchronized (log) {
>>>        name = "Transport" + id++;
>>>    }
>>> }
>>>
>>> This seems a little overboard though. It's not clear to me at the
>>> moment how important it is to have a truly unique transport id. It is
>>> really just used for debugging I think.
>>>
>>> Mike
>>>
>>> --
>>> Michael B Allen
>>> Java Active Directory Integration
>>> http://www.ioplex.com/
>>>
>>> On Thu, Jan 13, 2011 at 1:44 PM, Adam Morgan <adam.morgan at q1labs.com> wrote:
>>>> Hi Mike
>>>>
>>>>
>>>>
>>>> First off, Happy New Year!  Now to business.
>>>>
>>>>
>>>>
>>>> While tracking down an issue on one of our clients' systems, I discovered
>>>> that multiple SmbTransport(Transport) threads existed with the same
>>>> 'transportId'.  I think I'm correct when I say this shouldn't happen, and so
>>>> I'm trying to track down how this could happen.  The snippets from a stack
>>>> dump below.  You will notice that the thread names reflect the changes I
>>>> submitted a few weeks ago.  The second number (added as part of those
>>>> changes) reflects the number of threads that have been created by the
>>>> Transport class instances over the course of a jvm up time.
>>>>
>>>>
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2530" daemon
>>>> prio=10 tid=0x00002aab343fb800 nid=0x1f74 runnable
>>>> [0x000000007466d000..0x000000007466db10]
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State:
>>>> RUNNABLE
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:129)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.util.transport.Transport.readn(Transport.java:31)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.smb.SmbTransport.doRecv(SmbTransport.java:499)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     - locked
>>>> <0x00002aaac7470140> (a [B)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.util.transport.Transport.loop(Transport.java:108)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     - locked
>>>> <0x00002aaac84ab020> (a jcifs.smb.SmbTransport)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.util.transport.Transport.run(Transport.java:261)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.lang.Thread.run(Thread.java:619)
>>>>
>>>>
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2257" daemon
>>>> prio=10 tid=0x00002aab26dbec00 nid=0x7d38 runnable
>>>> [0x00002aab27276000..0x00002aab27276d90]
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State:
>>>> RUNNABLE
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:129)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:182)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.smb.SmbTransport.peekKey(SmbTransport.java:421)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.util.transport.Transport.loop(Transport.java:98)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> jcifs.util.transport.Transport.run(Transport.java:261)
>>>>
>>>> Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary     at
>>>> java.lang.Thread.run(Thread.java:619)
>>>>
>>>>
>>>>
>>>> The system we have seen this on was regularly hitting the responseTimeout
>>>> threshold due to network load spikes, but would recover in time as the
>>>> network load dropped.  I'm able to reproduce the response timeouts by
>>>> inducing delay on packet delivery from the smb server (using the linux tc
>>>> command), but so far have not seen multiple threads with the same
>>>> transportId value.
>>>>
>>>>
>>>>
>>>> My best guess at this point is that the first thread is being orphaned by a
>>>> 'thread = null;' call while it is blocked on a socketRead0() call.  If the
>>>> thread were blocking on the socket read, then potentially it would never
>>>> return and remain until jvm shutdown.  Perhaps a thread.interrupt() call
>>>> should precede each 'thread = null;' execution?  Granted, without being to
>>>> reproduce the issue I'm just postulating but I was interested in your
>>>> thoughts/comments.
>>>
>>
>>
>>
>> --
>> Michael B Allen
>> Java Active Directory Integration
>> http://www.ioplex.com/
>>
>
>
>
> --
> Michael B Allen
> Java Active Directory Integration
> http://www.ioplex.com/
>