[jcifs] More and more transport threads in blocking state

Fri Aug 8 22:12:41 MDT 2014

On Thu, Aug 7, 2014 at 4:29 AM, Stefan Neis <stefan.neis at auteq.de> wrote:
> Hi JCIFS support team,
>
>
>
> while we are investigating thread behavior on our application server we
> stumbled over dozens of Transport threads in blocking state. It turned out
> that they are belonging to the jCIFS framework on connecting to a not
> existing SMB resource. To reproduce this I wrote a little piece of code:
>
>
>
>         for (int i = 0; i < 100; i++) {
>
>             try {
>
>                 SmbFile folder = new SmbFile("smb://192.168.100.77/test");
>
>                 folder.exists();
>
>             }
>
>             catch (SmbException e) {
>
>
>
>             }
>
>         }
>
>
>
> My observations with the tool VisualVM are as follows. On the call exists()
> a thread named Transport1 is created. After 30 seconds a SmbException
> occurs:
>
>
>
> jcifs.smb.SmbException: Failed to connect: 0.0.0.0<00>/192.168.100.77
>
> jcifs.util.transport.TransportException: Connection timeout
>
>
>
> That is okay because the resource dos not exist. The next call to exists()
> creates another Transport1 thread, while the first one is still active for
> further 15 seconds. The second one is getting blocked by the first thread.
> Because of the fact that the interval threads are getting produced is
> shorter than the interval the threads are running, more and more threads are
> in blocking state.
>
> I tested it with jCIFS version 1.3.15 and 1.3.17.
>
>
>
> The quick and dirty solution is to wait al least 15 seconds after every
> thrown exception, but in my opinion it can't be the callers responsibility
> to handle frameworks internal processes. And what if I want to access the
> same resource from different places in my code?
>
>
>
> Do you agree with this or am I doing something completely wrong?

Hi Stefan,

I think that is happening because we use a separate Thread to read
from the socket (standard procedure in Java) and to connect and the
"transport" object used to represent the connection to the server is
shared. So the calling thread is timing out just before the socket
Thread.

More specifically, the following timeout properties and default values
in milliseconds govern JCIFS' behavior when trying to connect and
communicate with servers:

jcifs.smb.client.connTimeout = 35000
jcifs.smb.client.responseTimeout = 30000

Notice the difference of 5 seconds.

The connTimeout is passed to socket.connect() and instructs the JVM
how long it should wait for a server to successfully negotiate a TCP
connection. So with the default value of 35 seconds, if the server
does not exist (it does not respond to the SYN for example),
socket.connect() will throw a SocketException after 35 seconds.

The responseTimeout instructs JCIFS how long to wait for a meaningful
SMB response OR for the transport Thread to successfully return from
calling socket.connect(). So when trying to establish a connection
with a server that does not respond, it will timeout after 30 seconds.

For the sake of completeness, there is also jcifs.smb.client.soTimeout
= 35000 which is set using socket.setSoTimeout() immediately after
socket.connect() returns successfully and sets how long the JVM should
wait for ongoing TCP communication before throwing an exception. But
if the TCP connection is never successfully established, this property
should not be applicable.

You could try setting responseTimeout > connTimeout but I'm not sure
what effect that would have on overall logic. It might be a dubious
thing to do.

Otherwise, I'm not sure if something is "completely wrong". I would
have to think about that for a while. I could call interrupt() if the
responseTimeout is reached in Transport.connect(). But generally
calling Thread.interrupt() is considered bad. I don't think I would do
that unless I had no other choice really. And I don't think it would
be a good idea to just ditch the stuck transport and create a new one.
That would let your code continue but if it got stuck on another
server and another you would just exhaust VM and still not get
anywhere.

I don't think DFS is related to your particular issue but if you're
not using DFS it is a good idea to disable it by setting
jcifs.smb.client.dfs.disabled = true because JCIFS does not properly
retry other DFS nodes if DFS and / or file replication is
misconfigured in the domain. And perhaps more important there is a
deadlock bug in the DFS code.

If you try setting responseTimeout > connTimeout, let us know how it
goes. If it works well for you, maybe we should make it the default.

Mike

-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/