[jcifs] Timeout problem with jcifs NTLM HTTP authentication

Fri Nov 19 20:08:03 GMT 2004

On Fri, 19 Nov 2004 16:00:24 +0100
<Paul.Holaj at dekabank.de> wrote:

> Hi Mike,
> 
> thank You for the fast response.
> 
> > What the getChallengeForDomain() code does is roughly as follows:
> >
> >   if (list_expiration expired) {
> >      dc_list = lookup all domain controllers again
> >       // cachePolicy set to 10min by filter
> >      list_expiration = now() + jcifs.netbios.cachePolicy
> >   }
> >   for each dc in dc_list {
> >      try {
> >         communicate w/ dc
> >         return dc.challenge
> >      } catch {
> >         dc_list[i] = null // dc no good
> >      }
> >   }
> >   throw Exception
> 
> The bad dc was offline for some days.
> Since our WINS manages a static list, the list still contained the bad dc.
> I restarted the application server before each test and 
> tried to authenticate with my web app and jcifs:
> 
> - sometimes, authentication succeeded; 
>   in this case, I saw in the network trace, 
>   that a "good" dc was returned from WINS as first entry in the list.
> 
> - sometimes, the web app didn't respond and after several minutes,
>   a timeout exception was thrown; 
>   in this case, I saw in the network trace,
>   that the "bad" dc was returned from WINS as first entry in the list.
>   jcifs tried to reach the bad dc several times; 
>   I didn't see, that any other dc's were choosen and 
>   contacted from the list (for each loop ?).

I don't think there's any correlation between the behavior you see and the
bad DC exceptions you see in the log. The pseudocode shows that when the bad
DC is encountered it is set to null which removes it from consideration util
the list is refreshed after cachePolicy seconds. So basically you'll get a
delay after that time but the duration depends greatly on the state of the
bad DC. If it wasn't even responding to the socket connection you should see
no delay at all. The exception in the log suggests the SYN packet was
routable but did not respond or accepted the connection but then did not
respond. This would be an soTimeout delay which is by default 5 seconds. My
understanding of the code is that that is the worst case scenario for 1 bad
DC. So I have no idea where the "several minutes" is coming from. My
understanding of the code is that that's not possible. But I could be wrong
:->

> 
> Is this really the expected behaviour when encountering a bad dc ? 
> That means, that the application is not usable for about 10 minutes 
> until the timeout occurs ?

No. That should not happen. The only way I could see that happening is if A)
the DC is alive but struggling heavily (meaning the delay is the DC), B) all
the DCs in the list are bad, or C) some jcifs properties have been messed
with inappropriately. If you can reproduce the behavior and produce a
capture I would be interested in looking at it.

Mike

-- 
Greedo shoots first? Not in my Star Wars.