[jcifs] preventing soTimeout NT_STATUS_ACCESS_VIOLATION w/ NTLM
Kevin.Tapperson at hcahealthcare.com
Fri Jun 17 13:13:03 GMT 2005
>> I have had success in preventing jcifs from throwing an SmbAuthException with the NT_STATUS_ACCESS_VIOLATION ("Invalid access to memory location") error
>>code in association with an soTimeout event by implementing a reference counter for NTLM HTTP authentication requests in the SmbTransport class. The
>>following changes described below were done on the jcifs_1.1.10 code base. I checked the jcifs_1.2.0 code base to see how different the changes would be
>>for it. The only major difference is the change in the SmbTransport.run() method. (In 1.2.0, the changes below to SmbTransport would need to go into the
>>Transport class and the Transport.disconnect() method.)
>Why is this necessary? My understanding is that with the default soTimeout
>value of 10 minutes the chances of getting an access violation situation
>are very slight. What soTimeout value are you using?
I have tried using soTimeout values of 300000 (5 minutes) and 0. (I tried using an soTimeout value of 0 to avoid this problem, but found that it is still possible to generate the NT_STATUS_ACCESS_VIOLATION exception if the domain controller happens to decide it's time to close the transport socket during the time period after a type-2 message has been sent to a client but before the type-3 message has been received and processed by the filter.) By adding a reference counter (as previously described) to the jcifs code and using an soTimeout value smaller than what the domain controller uses (appears to be about 15 minutes), jcifs can be in complete control over when the transport socket gets closed and can avoid this error (except as would occur in cases of dropped network connections).
We have robotic monitoring of our application in place and keep getting dinged with unexplained SLA violations (due to inability to authenticate). After investigating, I found that our robotic script (and actual users of our system) are receiving this error from time to time. We have users scattered across the US, so round trip response times (even for small data packets like the NTLM authentication process) can sometimes be measured in seconds depending on network conditions. The frequency of occurrence of this problem is tied to the round trip response time between when a type-2 message is sent and when a type-3 message is received and processed. The longer it takes for a client browser to receive a type-2 message and send a type-3 response, the better chance there is of encountering this issue. With an soTimeout value of 5 minutes, this means that (depending on load and load distribution) up to (24*60/5) = 288 SmbTransport sockets from any one application server process will be closed per day per domain controller. If any of those socket close events happen to coincide with a delayed response from a client, this error would be generated. (We have 4 app servers running 3 JVMs each and load balancing across 3 different domain controllers for authentication. So at worst case with an soTimeout of 5 minutes, we have 4*3*3*288 = 10368 SmbTransport sockets closed in a day.)
More information about the jcifs