Port knock of 445 prevents smbd from starting

Christopher O Cowan - Christopher.O.Cowan@ibm.com Christopher.O.Cowan at ibm.com
Tue Jan 14 21:26:24 UTC 2020


In our cluster setup here, we use a load-balancer in front of our ctdb cluster to steer the SMB traffic.    We've been doing this for years.    It does a simple TCP connect on 445 to verify that each node's smbd is still alive.

This is all on AIX, and some time, in the past few weeks these knocks started causing smbd to exit.   Here's the output from an smbd -i -d10.

First I see this (x.y.z.219 is the address of the LB.  202 is the server).    I changed the addresses

-----------------------------------------------------------------------------------------------------------------------------------------

Allowed connection from x.y.z.219 (x.y.z.219)
Connection allowed from ipv4:x.y.z.219:49463 to ipv4:x.y.z.202:445
ctdbd_control: Sending ctdb packet reqid=7, vnn=4026531841, opcode=23, srvid=17509995351216488448
ctdbd_control: Sending ctdb packet reqid=8, vnn=4026531841, opcode=44, srvid=0
INFO: Current debug levels:
-----------------------------------------------------------------------------------------------------------------------------------------
Then after the debug settings, I see:
-----------------------------------------------------------------------------------------------------------------------------------------
init_oplocks: initializing messages.
Registering messaging pointer for type 774 - private_data=20141c48
Registering messaging pointer for type 778 - private_data=20141c48
Registering messaging pointer for type 770 - private_data=20141c48
Registering messaging pointer for type 787 - private_data=20141c48
Registering messaging pointer for type 779 - private_data=20141c48
Registering messaging pointer for type 15 - private_data=0
Overriding messaging pointer for type 15 - private_data=0
Deregistering messaging pointer for type 16 - private_data=0
Registering messaging pointer for type 16 - private_data=20141c48
Deregistering messaging pointer for type 33 - private_data=2011cf18
Registering messaging pointer for type 33 - private_data=20141c48
Deregistering messaging pointer for type 790 - private_data=0
Registering messaging pointer for type 790 - private_data=20141c48
Deregistering messaging pointer for type 791 - private_data=0
Deregistering messaging pointer for type 1 - private_data=0
Registering messaging pointer for type 1 - private_data=0
event_add_idle: idle_evt(keepalive) 20203538
event_add_idle: idle_evt(deadtime) 202040b8
event_add_idle: idle_evt(housekeeping) 202041c8
read_fd_with_timeout: blocking read. EOF from client.
receive_smb_raw_talloc failed for client ipv4:x.y.z.219:49463 read error = NT_STATUS_END_OF_FILE.
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
Security token: (NULL)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
change_to_root_user: now uid=(0,0) gid=(0,0)
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
Security token: (NULL)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
change_to_root_user: now uid=(0,0) gid=(0,0)
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
Security token: (NULL)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
change_to_root_user: now uid=(0,0) gid=(0,0)
smbXsrv_session_logoff_all: empty session_table, nothing to do.
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
Security token: (NULL)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
change_to_root_user: now uid=(0,0) gid=(0,0)
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
Security token: (NULL)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
change_to_root_user: now uid=(0,0) gid=(0,0)
msg_ctdb_ref_destructor: refs=0
msg_dgm_ref_destructor: refs=0
Server exit (failed to receive smb request)
Terminated
-----------------------------------------------------------------------------------------------------------------------------------------

I'm doing a binary search to try to isolate the change.   It seems that it assuming that the port knock is an incomplete SMB request based upon the EOF, and then exiting.    Any idea of what changed?   I haven't eliminated the possibility that it's problem with locking and timeouts on AIX.

If I shut the availability probe-off, everything comes up fine immediately.    It didn't seem to affect a running server (although my testing was limited).

Regards,
Chris



More information about the samba-technical mailing list