[jcifs] Issues with connections to servers that reboot

Michael B Allen ioplex at gmail.com
Fri Jun 24 21:32:43 MDT 2011


The 1.3.16 release includes try / finally and try / catch blocks to
prevent transports from getting stuck for unnecessarily long periods.
Also the InetSocketAddress class is now used to establish SO_TIMEOUT
before connect which may improve behavior in this scenario as well.

Mike

-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/

On Wed, Jun 22, 2011 at 1:29 PM, Michael B Allen <ioplex at gmail.com> wrote:
> Hi Sean,
>
> I was not able to reproduce this issue. Windows Server 2008r2 produces
> the "timedout waiting for response" error (as opposed to Windows
> Server 2003 which produces "connection reset") but after about a
> minute, the program recovered and correctly listed the target
> directory.
>
> I have applied Simon's try / catch anyway (but in
> util/transport/Transport.java) since an exception from doDisconnect is
> clearly bad for the Transport.java state machine. I don't know if it
> will help with your issue but I recommend trying the
> soon-to-be-released 1.3.16.
>
> Mike
>
> --
> Michael B Allen
> Java Active Directory Integration
> http://www.ioplex.com/
>
>
> On Tue, Jun 21, 2011 at 10:10 PM, Sean Daley <spdaley at gmail.com> wrote:
>> I seem to be running into a random issue with JCIFS re-connecting to a
>> server that is
>> rebooted.  I've attached a simple Java program which connects to the
>> Admin$ share
>> and calls listFiles on it.  It then repeats this every second.
>>
>> Sometimes, when I reboot a server and/or shut it down for a few
>> minutes and re-start
>> it, the JCIFS connection to that server never seems to recover.  This
>> doesn't seem to
>> happen to all of my servers but it does happen to some of them.
>>
>> I'm currently using Fedora 14 x86_64 as the JCIFS client connecting to
>> a wide-variety
>> of windows boxes.  The biggest windows culprit I have seems to be a
>> Windows 2008r2
>> box.
>>
>> For this particular box, I get the following logs from this test class:
>> 0: fileList returned 80 and took 190(ms).
>> 1: fileList returned 80 and took 11(ms).
>> ...
>> 33: fileList returned 80 and took 5(ms).
>> 34: fileList failed: Transport1[testhost/10.20.14.15:445] timedout
>> waiting for response to
>> Trans2FindFirst2[command=SMB_COM_TRANSACTION2,received=false,errorCode=0,flags=0x0018,flags2=0xC803,signSeq=0,tid=2048,pid=63708,uid=2048,mid=73,wordCount=15,byteCount=19,totalParameterCount=18,totalDataCount=0,maxParameterCount=10,maxDataCount=65535,maxSetupCount=0,flags=0x00,timeout=0,parameterCount=18,parameterOffset=66,parameterDisplacement=0,dataCount=0,dataOffset=84,dataDisplacement=0,setupCount=1,pad=1,pad1=0,searchAttributes=0x16,searchCount=200,flags=0x00,informationLevel=0x104,searchStorageType=0,filename=\]
>> took 30001(ms).
>> 35: ... (repeats the exact same thing as 34: every 30 seconds).
>>
>> I've let it run for awhile now and it will just continuously report
>> the "timedout waiting for ..."
>> error every 30 seconds.
>>
>> If I stop and re-start the program though it will re-connect just
>> fine.  If I enable
>> jcifs.Config.setProperty("jcifs.smb.client.ssnLimit", "1");
>> the problem also does not occur but I'd really rather not do that as
>> I'm going to potentially
>> be working with the same set of hosts many times and I rather like the
>> caching that's
>> being done here.
>>
>> I've played around with this program and differing target servers as
>> well as changing things
>> around to do something else other than a listFiles check (like an
>> exists) check and I've
>> received differing behaviors along the way.  For some of my
>> environment, with the
>> exists check, I got similar timeout behavior but it was a more
>> straight-forward exception
>> of "connection timed out".  What was worse though was that each time I
>> got that, I
>> was left with a new Thread running with the following stack trace:
>>
>> #########
>> Daemon Thread [Transport1] (Suspended)
>>        PlainSocketImpl.socketConnect(InetAddress, int, int) line: not
>> available [native method]
>>        SocksSocketImpl(PlainSocketImpl).doConnect(InetAddress, int, int) line: 333
>>        SocksSocketImpl(PlainSocketImpl).connectToAddress(InetAddress, int,
>> int) line: 195
>>        SocksSocketImpl(PlainSocketImpl).connect(SocketAddress, int) line: 182
>>        SocksSocketImpl.connect(SocketAddress, int) line: 366
>>        Socket.connect(SocketAddress, int) line: 529
>>        Socket.connect(SocketAddress) line: 478
>>        Socket.<init>(SocketAddress, SocketAddress, boolean) line: 375
>>        Socket.<init>(String, int) line: 189
>>        SmbTransport.ssn139() line: 185
>>        SmbTransport.negotiate(int, ServerMessageBlock) line: 240
>>        SmbTransport.doConnect() line: 302
>>        SmbTransport(Transport).run() line: 232
>>        Thread.run() line: 662
>> #########
>>
>> So every 30 seconds, I'd get the connection timedout error, then we'd
>> try to connect
>> again and a new Daemon Thread Transport1 would start.  These threads would take
>> upwards of 4 - 5 minutes (at least) before they finally terminated.
>> During that time though
>> we'll keep on accumulating more and more of them as we try to
>> re-connect.  Once again, if I
>> stop and re-start the test program it works just fine again right away.
>>
>> Is there any way to force a new SmbTransport to get created without
>> setting ssnLimit to 1?
>> I briefly tried setting it to 1 but I have some concerns about doing
>> that because we lose
>> the benefit of caching, plus, unless I'm misreading the code, it looks like the
>> CONNECTIONS LinkedList can grow unbounded.  So with ssnLimit == 1, we're just
>> constantly creating new SmbTransports and adding them to CONNECTIONS.  I didn't
>> find any place where we were removing them from the list though.
>>
>> Any thoughts on this?  Or is there any additional information I can get you?
>> Any help would be greatly appreciated.
>>
>> Sean
>>
>


More information about the jCIFS mailing list