[jcifs] NTLM usrname/password failure after each 5 mins

Sun Jun 15 22:00:39 GMT 2008

On 6/15/08, AsafM <asaf.mesika at gmail.com> wrote:
>
>  Hi all,
>
>  I'm reviving a 2 years old topic, regarding load testing.
>  You can take a look at the entire thread of the discussion
>  http://www.nabble.com/NTLM-usrname-password-failure-after-each-5-mins-td5381546.html#a5391633
>  here
>
>  I'll start with a quick summary, and then shed lots of details to make it
>  clearer:
>  After the transport disconnects (due to socket timeout) and connects, the
>  first 10 , or so, attempts to authenticate against the DC fails on bad
>  username/password. After those failures, all attempts succeeds.
>  I've gained some knowledge I'll now share, but I'm still missing some key
>  elements to figuring this out.
>
>  Load Testing Setup
>  110 threads, consistently accessing a protected resource on Tomcat, which
>  requires an NTLM authentication.
>  Each thread is using one user. For example: Thread-34 is logging in as user
>  TEST34.
>
>  The turn of events
>  1. The first thread accessing the resource, setups the session
>  (SmbSession.sessionSetup()), which blocks all other threads, since each
>  thread (user) requires to setup a session of its own.
>     The session setup runs the Transport.connect(), creates a tree for the
>  default user (to enable SMB signing), and send the SmbComSessionSetupAndX to
>  the DC, for authentication.
>
>  2. Once the 1st session setup is done, all other threads follows, each
>  creating its own session, attached to one transport object (Transport-1
>  thread).
>
>  3. On the second iteration of the test threads, there's no need for session
>  setup. The session object is retrieved from the transport (it's cached
>  there).
>  This usage of cache causes the lack of usage in the transport socket.
>
>  4. After soTimeout (jcifs constant of 5 min), the loop() method of Transport
>  receives a SocketTimeoutException, and calls Transport.disconnect() which in
>  turn calls SmbTransport.doDisconnect().
>
>  5. The doDisconnect() logs off all sessions attached to the transport
>  object, closes down the socket and finally resets the digest property, which
>  is used to sign each request sent to the DC (this is set in the first
>  sessionSetup in SmbSession).
>
>     ** First Problem**
>  While disconnects logs-off sessions, other threads were using them, and
>  acting as-if the transport is connected.

It is ok for other threads to reference sessions. If there is no
activity on the socket then it should be possible to close the
sessions even if there are 100 threads constantly calling
SmbSession.logon().

But the "acting as-if they transport is connected" sounds suspicious.
When a transport is shutdown it should call logoff() on each session
which should call treeDisconnect() on each transport which should set
treeConnected = false. Then, if threads regain access to calling
SmbSession.logon() they should see treeConnected = false and the first
thread should reconnect the tree, re-logon the session and reconnect
the transport. Then subsequent threads see treeConnected and you're
back in the steady-state.

>  I've bypassed this issue, by:
>  a) Setting the Transport.state to 0 in the Transport.disconnect() function.
>  This causes the Transport.connect() to actually connect.
>  b) Adding a synchronize (this) block on both disconnect() and connect()
>  methods, which prevents running connect() while disconnect() is commencing.

I don't understand this. The Transport.connect()/disconnect() methods
are already synchronized and the transport state is changed to 0 in
disconnect().

>  6. While disconnect() was running, all other threads were waiting in queue,
>  to run transport.connect(), in the SmbTree.treeConnect() method.
>     Once the disconnect finished, each thread in its turn, ran the connect
>  and cotinued for creating a session by running SmbSession.sessionSetup().
>  Since that function is syncrhonized on transport(), sessions were created
>  once at a time, for each thread.
>
>  7. The first session to run the setup, identified that the transport.digest
>  was empty (due to SmbTransport.doDisconnect()), thus ran treeConnect on the
>  default username, used for SMB signing.
>  Once that was finished successfully, it sent the SmbComSessionSetupAndX for
>  the user it was trying to authenticate.
>  It failed in the DC. SmbComSessionSetupAndXResponse returned with an error
>  code: Logon failure: unknown user name or bad password
>
>  8. Allot of threads after the first thread inline, failed also on the exact
>  spot in the sessionSetup().

There is a known "hiccup" that occurs whenever the connection is
recycled due to the soTimeout. I don't know what the problem is. I
assume the challenge is momentarily wrong.

>  9. From some magical reason, which I'm yet to figure out, after 10 or so
>  failures, the DC started returning success in the
>  SmbComSessionSetupAndXResponse.

Is the NTLM challenge old? Log the hexdump of the NTLM challenge and
see if it changes with the result of the
SmbComSessionSetupAndXResponse. If it does that confirms that the
challenge isn't being handled properly. If it does not change and the
new challenge is being used correctly, but the DC is returning
different results given the same input then that would be very
interesting.

This is the best analysis of the "hiccup" bug that I've seen. Aside
from my comments, everything you say is true and is expected behavior.
The interesting parts are the "acting as-if they transport is
connected" bit and what the challenge is spanning the authentication
failure / success.

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/