[jcifs] Transport thread dead lock

Adam Morgan adam.morgan at Q1Labs.com
Tue Jan 18 08:13:57 MST 2011


Hi Mike

Ok so we had a customer site with roughly 80 separate transport threads operating at the same time.  It appears there is an effective deadlock existing between four separate threads, due to the synchronization strategy around using the static byte array BUF and the SmbTransport object.  I'll try to explain below using snippets from a stack dump we grabbed at the time.

As you can see below, Thread A is awaiting a lock on the static byte array (0x00002aaac7470140), Thread B is blocked on the socket read, and has a lock on the byte array, Thread C is waiting for a lock on an SmbTransport instance (0x00002aaac84b2e80), and Thread D has the lock on that instance of SmbTransport, and is awaiting a lock on the static byte array that Thread B has locked.

Now, we got a couple thread dumps a couple minutes apart, and the threads were still in this state.  All this time they were receiving no data from the code using jcifs.  I realize it isn't conclusively a deadlock based on this output (since we don't know for sure that the read() call in Thread B is not going to return) but given that the system seemed to remain in this state until a jvm restart, it appears to me that it IS a deadlock.  Further, I patched their system with a change to make the byte array BUF to be non-static, and they've yet to hit the issue again (they WERE hitting it within a day or so, and they've now been up for 4 days with no issue).

So, to resolve this I'd recommend that the byte array be changed from static to non-static.  The caveat with that is that each thread will then need 65K memory.  Preferably, I'd implement a buffer pool that all threads can use to minimize the impact on memory requirements.

Thoughts/Comments?



Thread A

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "pool-1-thread-415" prio=10 tid=0x0000000052f4b800 nid=0x1d95 waiting for monitor entry [0x00002aab27c7f000..0x00002aab27c80a10]

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State: BLOCKED (on object monitor)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.smb.SmbTransport.doSend(SmbTransport.java:437)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - waiting to lock <0x00002aaac7470140> (a [B)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.util.transport.Transport.sendrecv(Transport.java:69)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - locked <0x00002aaac853d3e8> (a jcifs.smb.SmbTransport)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.smb.SmbTransport.send(SmbTransport.java:655)

...


Thread B

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport19-2530" daemon prio=10 tid=0x00002aab343fb800 nid=0x1f74 runnable [0x000000007466d000..0x000000007466db10]

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State: RUNNABLE

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at java.net.SocketInputStream.socketRead0(Native Method)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at java.net.SocketInputStream.read(SocketInputStream.java:129)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.util.transport.Transport.readn(Transport.java:31)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.smb.SmbTransport.doRecv(SmbTransport.java:499)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - locked <0x00002aaac7470140> (a [B)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.util.transport.Transport.loop(Transport.java:108)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - locked <0x00002aaac84ab020> (a jcifs.smb.SmbTransport)

...

Thread C

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "jcifs-transport14-2547" daemon prio=10 tid=0x0000000050e9f800 nid=0x4588 waiting for monitor entry [0x00000000594bd000..0x00000000594bda90]

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State: BLOCKED (on object monitor)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.util.transport.Transport.disconnect(Transport.java:204)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - waiting to lock <0x00002aaac84b2e80> (a jcifs.smb.SmbTransport)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.util.transport.Transport.loop(Transport.java:124)

...

Thread D

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary "Folder Monitor [10.1.228.200][smb://10.1.228.200/E$/IISlogs/]" daemon prio=10 tid=0x0000000051cd1400 nid=0x4587 waiting for monitor entry [0x00002aab27f83000..0x00002aab27f83d10]

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    java.lang.Thread.State: BLOCKED (on object monitor)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.smb.SmbTransport.doSend(SmbTransport.java:437)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - waiting to lock <0x00002aaac7470140> (a [B)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.util.transport.Transport.sendrecv(Transport.java:69)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    - locked <0x00002aaac84b2e80> (a jcifs.smb.SmbTransport)

Jan  8 13:37:26 s5kep-tcc-an02-apd1-primary    at jcifs.smb.SmbTransport.send(SmbTransport.java:655)

...


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/jcifs/attachments/20110118/f480ba42/attachment-0001.html>


More information about the jCIFS mailing list