[jcifs] pile up of Transport threads in BLOCKED state

Michael B Allen ioplex at gmail.com
Mon Nov 29 15:26:48 MST 2010


Hi Adam,

What is the thread blocked on?

Get a backtrace.

Mike

On Fri, Nov 26, 2010 at 10:51 AM, Adam Morgan <adam.morgan at q1labs.com> wrote:
> Hi Mike
>
>
>
> We’ve run into an issue around a pile up of the Transport threads, to the
> point of using all available os threads, and causing OOMs across ALL jvms on
> the box.
>
>
>
> A customer had a configuration set up successfully for a while, but at some
> point decided to shut it off (ie shut down samba on the target box, plus
> potentially some other services), but forgot to turn off our code so it
> would stop trying to reconnect.  As a result, the box made roughly 20k
> reconnect attempts over the next couple days.  Each reconnect attempt met a
> NoRouteToHostException, and for whatever reason the Transport threads remain
> in a BLOCKED state, waiting on an object monitor, and are never released.
>  In light of this ‘runaway’ reconnect, we’ve refactored the code to attempt
> the reconnects on a increasingly-delayed basis (1s, 2s, 4s,… up to 15min
> between attempts, resetting to 0s on a successful reconnect) in hopes that
> we were just not waiting long enough and the issue would be resolved.
> However, this refactoring did not resolve the issue and we still see the
> ‘Transport’ thread count creep up over time.
>
>
>
> Nov 16 10:02:13 evlqradar01 ecs[25797]: java.net.NoRouteToHostException: No
> route to host
>
> Nov 16 10:02:13 evlqradar01 ecs[25797]:         at
> java.net.PlainSocketImpl.socketConnect(Native Method)
>
> Nov 16 10:02:13 evlqradar01 ecs[25797]:         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>
> Nov 16 10:02:13 evlqradar01 ecs[25797]:         at
> java.net.Socket.<init>(Socket.java:180)
>
> Nov 16 10:02:13 evlqradar01 ecs[25797]:         at
> jcifs.smb.SmbTransport.ssn139(SmbTransport.java:178)
>
>
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]: java.net.NoRouteToHostException: No
> route to host
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> java.net.PlainSocketImpl.socketConnect(Native Method)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> java.net.Socket.connect(Socket.java:469)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> java.net.Socket.<init>(Socket.java:180)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> jcifs.smb.SmbTransport.negotiate(SmbTransport.java:242)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> jcifs.smb.SmbTransport.doConnect(SmbTransport.java:305)
>
> Nov 16 10:03:17 evlqradar01 ecs[31981]:         at
> java.lang.Thread.run(Thread.java:619)
>
>
>
> Here’s a snippet from our logs showing the thread state and blocked-time:
>
>
>
> Nov 17 14:23:07 172.24.251.251 [ecs] [Folder Monitor [SMTP
> tailer][smb://172.24.1.162/c$/Program Files/Microsoft/Exchange
> Server/TransportRoles/Logs/ProtocolLog/] - Reconnect]
> com.q1labs.frameworks.core.ThreadExceptionHandler: [INFO]
> [NOT:0000006000][172.24.251.251/- -] [-/- -]124143,Transport1 in Byte Code,
> BLOCKED, blocked-count: 1, blocked-time: 1003721 ms, wait-count: 0,
> wait-time: 0 ms, user cpu: 0 nanos, sys/user cpu time: 0 nanos, Transport1
> locked on [B at 72faf9b5
>
> Nov 17 14:23:07 172.24.251.251 [ecs] [Folder Monitor [SMTP
> tailer][smb://172.24.1.162/c$/Program Files/Microsoft/Exchange
> Server/TransportRoles/Logs/ProtocolLog/] - Reconnect]
> com.q1labs.frameworks.core.ThreadExceptionHandler: [INFO]
> [NOT:0000006000][172.24.251.251/- -] [-/- -]124142,Transport2 in Byte Code,
> BLOCKED, blocked-count: 1, blocked-time: 1003753 ms, wait-count: 0,
> wait-time: 0 ms, user cpu: 0 nanos, sys/user cpu time: 0 nanos, Transport2
> locked on [B at 5e36d88a
>
>
>
>
>
> As you can see, the threads are locked and have been blocked for ~1000s…
> well above any of the default timeouts listed in SmbConstants.
>
>
>
> Can you shed any light into this as to whether we have any configuration
> options or if this is a jcifs bug?
>
>
>
> Thanks
>
>
>
> Adam



-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/


More information about the jCIFS mailing list