[Samba] password server and round robin dns of DCs

Matt Baker m at wheres.co.uk
Mon Jul 2 11:54:07 GMT 2007


Hi All,

I've been having a problem recently with LDAP queries for group names in
winbind. I'm fairly certain the problem is to do with the fact that I'm
using a round robin dns name for the password server. When samba starts
it attaches itself to what I presume is the first server that returns
from the dns lookup. When that server is taken down for maintenance it
causes winbind to stop resolving group names, etc. see below for more
details.

My config:

[global]
        workgroup = XXX
        realm = xxx.xxxxx.xxx
        netbios name = XXXX-XXXXX
        server string = %h server (Samba %v)
        security = ADS
        obey pam restrictions = Yes
        password server = xxx.xxxxx.xxx
        passdb backend = tdbsam
        passwd program = /usr/bin/passwd %u
        passwd chat = *Enter\snew\sUNIX\spassword:* %n\n
*Retype\snew\sUNIX\spassword:* %n\n .
        restrict anonymous = 1
        syslog = 0
        log file = /var/log/samba/log.%m
        max log size = 1000
        socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
        local master = No
        dns proxy = No
        panic action = /usr/share/samba/panic-action %d
        idmap uid = 10000-100000000
        idmap gid = 10000-100000000
        template homedir = /home/%U
        template shell = /bin/bash
        winbind use default domain = Yes
        invalid users = root
        hosts allow = xxx.xxx.34.0/255.255.255.0,
xxx.xxx.16.0/255.255.255.0, 127.0.0.1



My /var/log/samba/log.winbind:

2007/07/01 18:03:45, 1] libsmb/clientgen.c:cli_rpc_pipe_close(376)
  cli_rpc_pipe_close: cli_close failed on pipe \lsarpc, fnum 0x2f to
machine XXX-DC.  Error
was Call timed out: server did not respond after 10000 milliseconds
[2007/07/01 18:03:45, 1] libsmb/clientgen.c:cli_rpc_pipe_close(376)
  cli_rpc_pipe_close: cli_close failed on pipe \NETLOGON, fnum 0x4004 to
machine XXX-DC.  Er
ror was Call timed out: server did not respond after 10000 milliseconds
[2007/07/01 18:04:00, 1] libads/cldap.c:recv_cldap_netlogon(215)
  no reply received to cldap netlogon
[2007/07/01 18:04:00, 1] nsswitch/winbindd_ads.c:ads_cached_connection(114)
  ads_connect for domain XXX failed: Interrupted system call
[2007/07/01 18:04:00, 1] nsswitch/winbindd_group.c:fill_grent_mem(106)
  could not lookup membership for group rid
S-1-5-21-1117850145-1682116191-196506527-513 in d
omain UOB (error: NT_STATUS_UNSUCCESSFUL)
[2007/07/01 18:04:00, 1] nsswitch/winbindd_group.c:getgrgid_got_sid(346)
  could not lookup sid
...
[2007/07/01 18:11:48, 1] libads/cldap.c:recv_cldap_netlogon(215)
  no reply received to cldap netlogon
[2007/07/01 18:11:48, 1] nsswitch/winbindd_ads.c:ads_cached_connection(114)
  ads_connect for domain XXX failed: Interrupted system call
[2007/07/01 18:11:48, 1] nsswitch/winbindd_group.c:fill_grent_mem(106)
  could not lookup membership for group rid ...
nsswitch/winbindd_group.c:winbindd_getgrnam(259)
  group XXX-group in domain XXX does not exist
...


It was my hope that the round robin dns would be expanded and Samba
would retry the other servers in the DNS lookup. I can see now this does
not work (although I'd like confirmation of this if possible).

Authentication continues to work; the Kerberos realm uses the same round
robin dns entry.

I wonder if this classifies as a bug or feature request or is deliberate
by design? I cannot use "password server = *" as the member servers are
not sitting in the same IP subnet as the DCs, as I am aware, the
discovery uses the netmask.

I'm quite willing to change it to a know list of DCs but the round robin
somehow seemed nicer.

Any suggestions gladly welcomed.

Matt


More information about the samba mailing list