[jcifs] SmbTransport thread accumulation issues
Sebastian Sickelmann
sebastian.sickelmann at gmx.de
Mon Oct 17 22:22:30 MDT 2011
Hi,
i don't have the problems with the thread accumulation, because i do not
create so many connections, but i see the point that creating a thread
for every machine you connect to can lead to such problems. I think the
main problem is that jcifs is using blocking io instead of non-blocking
io. I made some experiments to use non-blocking io in jcifs and handle
the io with some small amount of threads, but it is far away from beeing
ready for a first review by the community and the commiters(i am not a
commiter) of jcifs.
The question i have is: "are there many/some users of jcifs who are
using many connections in a single jvm-process?"
-- Sebastian
Am 17.10.2011 22:58, schrieb Colin Hay:
>
> Hi all,
>
> I apologize if this lacks a degree of specificity (and evidence); I've
> inherited some jcifs-related work from a former co-worker and am
> essentially trying to get caught up on what he did. This is really
> intended as a disclosure of some changes we made to jcifs rather than
> a request for help; we seem to have the issue resolved but since jcifs
> is licensed under the LGPL I figure we should share the alterations
> with the community in case they might be useful to others, and maybe
> they can be considered for inclusion in a later release of jcifs.
>
> My company's product uses jcifs to connect to a number of remote
> windows machines (depending on the customer, this could be just a
> handful, or several hundred). We have two components that make use of
> jcifs for different purposes; each employs its own retry mechanism.
> One utilizes a retry queue such that if a given connection attempt to
> a remote machine does not complete successfully within 45 seconds, we
> wait 30 seconds and make another connection attempt. We keep trying
> every 30 seconds until success. The other component doubles its wait
> interval between each retry attempt; starting at 1 second for the
> first connect failure, 2 seconds for the second, 4 seconds for the
> third, etc, though we max out at 15 minutes.
>
> A while back we came across a problem where we were accumulating
> threads at a rate such that we would eventually hit an OOM that would
> kill our jvm. This was because the windows machines we were trying to
> connect to were not responsive, and whatever issue they were having
> resulted in the threads created in
> jcifs.util.transport.Transport.connect(long timeout) blocking and
> staying active even after the timeout expired and the reference to the
> thread was nullified by the creating thread. The next time a connect
> attempt was initiated, another thread would be created in the
> connect() method, and this would continue until we had a serious
> problem on our hands because of the accumulation of stranded blocked
> threads. I can't give details as to what the threads were blocking on
> because I can't find any thread dumps from when the original issue was
> investigated, nor any explanation as to how to reproduce the problem
> (it was discovered at a customer site).
>
> My predecessor's solution to this was to:
>
> a)add a thread.interrupt() call to the synchronized(thread){} block of
> jcifs.util.transport.Transport.connect(long timeout) in an effort to
> make sure the thread does not hang around forever:
>
> *synchronized*(thread) {
>
> thread.start();
>
> thread.wait( timeout ); /* wait for doConnect */
>
> *switch*(state) {
>
> *case*1: /* doConnect never returned */
>
> state= 0;
>
> thread.interrupt();
>
> thread= *null*;
>
> *throw**new*TransportException( "Connection timeout");
>
> *case*2:
>
> *if*(te!= *null*) { /* doConnect throw Exception */
>
> state= 4; /* error */
>
> thread= *null*;
>
> *throw*te;
>
> }
>
> state= 3; /* Success! */
>
> *return*;
>
> }
>
> }
>
> b)add a cleanupThread() method to jcifs.util.transport.Transport,
> called from connect(long timeout) before creating the new thread, to
> check if the thread has already been initialized by a previous call to
> connect() and if so, interrupt and nullify it.
>
> state= 1;
>
> te= *null*;
>
> cleanupThread();
>
> thread= *new*Thread( *this*, name+ "-"+ /threadId/++ );
>
> thread.setDaemon( *true*);
>
> *private**void*cleanupThread()
>
> {
>
> *if*(thread== *null*)
>
> {
>
> *return*;
>
> }
>
> *if*(thread.isAlive())
>
> {
>
> thread.interrupt();
>
> }
>
> thread= *null*;
>
> }
>
> These two changes seem redundant to me, and a bit dangerous (I'd
> prefer not to blindly interrupt a thread in progress), but without
> knowing how to reproduce the problem to test, I'm forced to take my
> predecessor's word for it (and the fact that the customer's ticket was
> closed) that it worked.
>
> I don't necessarily expect these changes to be included in a future
> release because of the vagueness of the problem, but if anyone has
> seen a similar thread accumulation, you could try making these same
> changes (one or the other or both), and if it helps, maybe you could
> share a thread dump of the situation prior to the change, so it can be
> properly documented as to what the problem is (i.e. what the threads
> end up blocked on). If the problem can be properly identified and
> reproduced, and the solution proven to be effective, it might make a
> good addition to a future release (whenever that might be).
>
> Cheers,
>
> Colin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/jcifs/attachments/20111018/24a9a145/attachment-0001.html>
More information about the jCIFS
mailing list