[Samba] winbind seems to hang when the DC goes down instead of switching to the other available DC

Andrea Cucciarre' acucciarre at cloudian.com
Wed Jul 31 13:12:21 UTC 2019


Hello,

I'm running Samba 4.9.5 as domain member, when I bring down the current 
Window DC (10.50.50.187) the winbind seems to hang instead of switching 
to the other available DC (10.50.50.25)
The "net ads" command show that Samba switched to the other available DC:

net ads join -U 'administrator' -S 'PAVONE.HYPERFILE.LOCAL' 
'HYPERFILE.LOCAL'^C
root at epsilon64:/opt/samba/log# net ads info
LDAP server: 10.50.50.25
LDAP server name: pavone.hyperfile.local
Realm: HYPERFILE.LOCAL
Bind Path: dc=HYPERFILE,dc=LOCAL
LDAP port: 389
Server time: Wed, 31 Jul 2019 14:50:44 CEST
KDC server: 10.50.50.25
Server time offset: 0
Last machine account password change: Wed, 31 Jul 2019 14:37:34 CEST
root at epsilon64:/opt/samba/log# wbinfo --ping-dc
checking the NETLOGON for domain[HYPERFILE] dc connection to "" failed
failed to call wbcPingDc: WBC_ERR_WINBIND_NOT_AVAILABLE

however wbinfo --ping-dc fails:

  wbinfo --ping-dc
checking the NETLOGON for domain[HYPERFILE] dc connection to "" failed
failed to call wbcPingDc: WBC_ERR_WINBIND_NOT_AVAILABLE

Hereafter the relevant logs, it seems that winbindd realized that the DC 
is down and it can use the other one (10.50.50.25), but it fails to connect

[2019/07/31 14:51:11.362303, 11, pid=7139, effective(0, 0), real(0, 0), 
class=winbind] 
../source3/winbindd/winbindd_util.c:2006(winbindd_set_locator_kdc_env)
   winbindd_set_locator_kdc_env: setting var: 
WINBINDD_LOCATOR_KDC_ADDRESS_HYPERFILE.LOCAL to: 10.50.50.25
[2019/07/31 14:51:11.362398,  3, pid=7139, effective(0, 0), real(0, 0)] 
../source3/libsmb/cliconnect.c:272(cli_session_creds_prepare_krb5)
   got OID=1.3.6.1.4.1.311.2.2.30
   got OID=1.2.840.48018.1.2.2
[2019/07/31 14:51:11.362540, 10, pid=7139, effective(0, 0), real(0, 0)] 
../source3/libads/kerberos.c:141(kerberos_kinit_password_ext)
   kerberos_kinit_password: as EPSILON64$@HYPERFILE.LOCAL using 
[MEMORY:cliconnect] as ccache and config 
[/opt/samba/var/lock/smb_krb5/krb5.conf.HYPERFILE]
[2019/07/31 14:51:11.816828,  5, pid=7139, effective(0, 0), real(0, 0), 
class=auth] ../auth/gensec/gensec_start.c:739(gensec_start_mech)
   Starting GENSEC mechanism spnego
[2019/07/31 14:51:11.816989,  5, pid=7139, effective(0, 0), real(0, 0), 
class=auth] ../auth/gensec/gensec_start.c:739(gensec_start_mech)
   Starting GENSEC submechanism gse_krb5
[2019/07/31 14:58:36.719216, 10, pid=7139, effective(0, 0), real(0, 0), 
class=auth] ../auth/gensec/gensec.c:440(gensec_update_send)
   gensec_update_send: gse_krb5[8197c98]: subreq: 819e628
[2019/07/31 14:58:36.719316, 10, pid=7139, effective(0, 0), real(0, 0), 
class=auth] ../auth/gensec/gensec.c:440(gensec_update_send)
   gensec_update_send: spnego[819ca68]: subreq: 819bb50
[2019/07/31 14:58:36.719358, 10, pid=7139, effective(0, 0), real(0, 0), 
class=auth] ../auth/gensec/gensec.c:498(gensec_update_done)
   gensec_update_done: gse_krb5[8197c98]: 
NT_STATUS_MORE_PROCESSING_REQUIRED 
tevent_req[819e628/../source3/librpc/crypto/gse.c:841]: state[2] error[0 
(0x0)]  state[struct gensec_gse_update_state (819e708)]
timer[0] finish[../source3/librpc/crypto/gse.c:851]
[2019/07/31 14:58:36.719424, 10, pid=7139, effective(0, 0), real(0, 0), 
class=auth] ../auth/gensec/gensec.c:498(gensec_update_done)
   gensec_update_done: spnego[819ca68]: 
NT_STATUS_MORE_PROCESSING_REQUIRED 
tevent_req[819bb50/../auth/gensec/spnego.c:1601]: state[2] error[0 
(0x0)]  state[struct gensec_spnego_update_state (819bc30)] timer
[0] finish[../auth/gensec/spnego.c:2070]
[2019/07/31 14:58:36.719886,  3, pid=7139, effective(0, 0), real(0, 0)] 
../source3/libsmb/cliconnect.c:1679(cli_session_setup_creds_done_spnego)
   SPNEGO login failed: The transport connection is now disconnected.

I have looked at the stack of that winbindd pid and ti seems it's hung 
connecting to the old DC (10.50.50.187) which is down :

root at epsilon64:/opt/samba/log# pstack 7139
7139:   /opt/samba/sbin/winbindd --daemon
  fc8f0ec5 connect  (1b, 819d780, 10, 1)
  fee5c2bb connect  (1b, 819d780, 10, fe2363f0) + 23
  fe2365a0 krb5_sendto (819cd78, 80451fc, 8197f38, 80451f4) + 1bd
  fe236b36 krb5_sendto_context (819cd78, 819af70, 80451fc, 819d810, 
80451f4, fe236c2d) + 12c
  fe2152d7 get_cred_kdc (819cd78, 819b330, 6, 0, 80453a4, 8198258) + 490
  fe2156bc krb5_get_kdc_cred (819cd78, 819b330, 6, 0, 0, 80453a4) + ce
  fe2177f1 krb5_get_forwarded_creds (819cd78, 819bd90, 819b330, 6, 
81806a8, 80453a4) + 1b0
  fee07c0f do_delegation (819cd78, 819bd90, 819b330, 8197ee8, 819b0d8, 
804543c) + f9
  fee0809b init_auth_restart (80455f4, 8199498, 819a210, 819cd78, 802e, 
0) + 139
  fee089d5 _gsskrb5_init_sec_context (80455f4, 8199498, 818af1c, 
819b018, 819a1f4, 802e) + 1e8
...

# pfiles 7139
7139:   /opt/samba/sbin/winbindd --daemon
  ...
   27: S_IFSOCK mode:0666 dev:538,0 ino:32677 uid:0 gid:0 rdev:0,0
       O_RDWR FD_CLOEXEC
         SOCK_STREAM
         SO_SNDBUF(49152),SO_RCVBUF(128480)
         sockname: AF_INET 10.50.50.10  port: 48405

# netstat -an | grep 46793
10.50.50.10.46793    10.50.50.187.88          0      0 128480      0 
SYN_SENT

The issue can be recovered by restarting the winbindd.
Hereafter the smb.conf:

========
[global]
client ldap sasl wrapping = plain
dedicated keytab file = /etc/krb5.keytab
disable spoolss = yes
host msdfs = no
idmap config * : backend = tdb
idmap config * : range = 30000-40000
idmap config * : schema_mode = rfc2307
idmap config HYPERFILE : backend = rid
idmap config HYPERFILE : range = 1000000-20000000
idmap config HYPERFILE : schema_mode = rfc2307
idmap config HYPERLOOP : backend = rid
idmap config HYPERLOOP : range = 20000001-30000000
idmap config HYPERLOOP : schema_mode = rfc2307
kerberos method = secrets and keytab
load printers = no
local master = no
log file = /opt/samba/log/%m.log
log level = 11
map acl inherit = Yes
map to guest = bad user
max log size = 100000
os level = 3
preferred master = no
printcap name = /dev/null
realm = HYPERFILE.LOCAL
security = ads
server string = Data %h
store dos attributes = Yes
vfs objects = zfsacl
winbind enum groups = yes
winbind enum users = yes
winbind expand groups = 0
winbind nested groups = yes
winbind normalize names = no
winbind nss info = rfc2307
winbind refresh tickets = Yes
winbind use default domain = no
workgroup = HYPERFILE
==========

Is it a bug or can you advice about what could be wrong in my config?

Thanks
Andrea







More information about the samba mailing list