[Samba] Windbindd restarts and lost uids

Klinger, John (N-CSC) john.klinger at lmco.com
Thu Feb 19 00:06:24 GMT 2004


We have a problem with our setup that exposed another problem in winbindd.

Setup: Samba 3.0.1, openLDAP 1.2.23 for idmap backend using port 1389,
       security=ads (w2k). All works after we get the daemons started
       successfully.

The first issue has to do with the initial setup of both a Samba
server [smbd, nmbd, winbindd, slapd], and a Samba client [just nmbd and
winbindd]. We do the following:

   1) Configure smb.conf, krb5.conf, slapd.conf
   2) kinit domadmin at TST.SAMBA17.TEST
   3) net ads join

At this point, a klist shows the tickets, and net ads testjoin returns
success. Then we try to start all the processes as root via:

   slapd -h ldap://smbdev.tst.samba17.test:1389
   smbd -D 
   nmbd -D 
   winbindd

Usually, and this is the first problem, the log.winbindd reports the
following:

   nsswitch/winbindd.c:main(842)
     winbindd version 3.0.1 started.
     Copyright The Samba Team 2000-2003
   nsswitch/winbindd_util.c:add_trusted_domain(150)
     Added domain TST TST.SAMBA17.TEST
   libsmb/clikrb5.c:ads_krb5_mk_req(269)
     krb5_cc_get_principal failed (No credentials cache found)
   libads/kerberos.c:ads_kinit_password(133)
     kerberos_kinit_password HOST/smbdev at TST.SAMBA17.TEST failed:
     Client not found in Kerberos database
   nsswitch/winbindd_ads.c:ads_cached_connection(65)
     ads_connect for domain TST failed: Operations error
   nsswitch/winbindd_util.c:add_trusted_domains(207)
     scanning trusted domain list

If the error "Client not found in Kerberos database" appears,
winbindd will not connect to the AD, giving the "Operations error".
If we leave winbindd running it appears not to ever reconnect.
Instead, we have to kill and restart winbindd. So now we have
an ugly scripted loop that does something to the effect of the
following:

   1) start all daemons
   2) Wait 30 seconds for winbindd to go through its initialization
   3) try a wbinfo -u.
   4) If no users are returned by wbinfo, kill all daemons and go
      back to step 1.

Winbindd successfully connects after an average of 2.5 minutes. On this
success, the same section of the log will show:

   nsswitch/winbindd.c:main(842)
     winbindd version 3.0.1 started.
     Copyright The Samba Team 2000-2003
   nsswitch/winbindd_util.c:add_trusted_domain(150)
     Added domain TST TST.SAMBA17.TEST
   libsmb/clikrb5.c:ads_krb5_mk_req(269)
     krb5_cc_get_principal failed (No credentials cache found)
   nsswitch/winbindd_util.c:add_trusted_domains(207)
     scanning trusted domain list

I would truly like to know what we are doing wrong here. I've searched
the web and newsgroups and have found similar errors, but no solutions
that worked here. Code purusals haven't found anything (yet). The time
between the Samba server and the AD are within a second of each other. Permissions are correct and files exist as expected.

It definitely appears to be related to time, though. If I do the
kinit and net ads join, then wait around five minutes before starting
the daemons, winbindd starts up fine. What could need the time?
Is there any command that I can run before starting Samba to
verify all is ready?

If you have suggestions for this, please pass them on, because this
causes another problem that is a killer for us.

----

First I'll give an abbreviated description of what I'm about to describe.
I hope this will make the next paragraph easier to follow:

with user testuser1 and domain TST:

0) client tries to access a global share
1) client GETPWNAM    request for tst+testuser1 to the Samba server
2) client GETPWNAM    request for TST+testuser1 to the Samba server
3) client GETPWNAM    request for TST+TESTUSER1 to the Samba server
4) client GETPWNAM    request for testuser1     to the Samba server
5) client GETPWNAM    request for TESTUSER1     to the Samba server
6) client CREATE_USER request for testuser1, which returns success and
   uses up a uid.
7) Process repeats since the winbindd server still can't authenticate
   using the AD -- including the CREATE_USER.

----

Now to elaborate:

There have been situations where a PC tries to access a Samba share.
In this case, the user is an AD global user that is not yet in the
openLDAP idmap backend. I expect that even if it was in the idmap, the
problem would still exist.

At the time of these client requests, winbindd is up and running
on the Samba server, but received that Kerberos error above (we are at
step 2 of the startup loop, so winbindd can't talk with AD yet). The
client tries five variations of the user name, attempting to authenticate
using the winbindd server. This failing, it sends a CREATE_USER request
to the winbindd server. The winbindd server pulls the next available
id from the openLDAP backend, incrementing the openLDAP's next-available
stored in uidNumber (<= issue!!), then calls wb_storepwnam.
wb_storepwnam returns success (hmmmm, recall that we authenticate to AD
and don't have a working connection to AD). The client then repeats
this cycle, requesting authentication, then doing another CREATE_USER.
This cycle is repeated roughly 30 times per second.

The very bad part of this is that while winbindd is up and can't connect
to AD, 30 uids are lost every second since the CREATE_USER request
increments the openLDAP uidNumber every time it is called. We have a limited
uid range, so this is very painful for us. If we leave winbindd running
30 seconds only to find out it didn't connect to the AD, we've lost
900 uids in the interim.

I really want to solve the first, startup problem, and know it would
likely mask the second, uid problem. However, I am willing to fix
the second, as it still appears to be a valid issue. Before I work
on it, though, I'd like some input from other Samba developers.

================================

Here is the section of the log.winbindd that is repeated 30 times a second,
each time with a new, consecutive uid in the wb_storepwnam output. The
timestamp has been removed for brevity and were all "2004/02/12 18:00:20".

[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 1300 bytes.
[10] nsswitch/winbindd.c:client_write(556)
  client_write: need to write 512 extra data bytes.
[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 512 bytes.
[10] nsswitch/winbindd.c:client_write(545)
  client_write: client_write: complete response written.
[10] nsswitch/winbindd.c:winbind_client_read(457)
  client_read: read 1568 bytes. Need 0 more for a full request.
[10] nsswitch/winbindd.c:process_request(307)
  process_request: request fn GETPWNAM
[3] nsswitch/winbindd_user.c:winbindd_getpwnam(113)
  [ 5263]: getpwnam tst+testuser1
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(343)
  refresh_sequence_number: TST time ok
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(368)
  refresh_sequence_number: TST seq number is now -1
[1] nsswitch/winbindd_user.c:winbindd_getpwnam(148)
  user 'testuser1' does not exist

[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 1300 bytes.
[10] nsswitch/winbindd.c:winbind_client_read(457)
  client_read: read 1568 bytes. Need 0 more for a full request.
[10] nsswitch/winbindd.c:process_request(307)
  process_request: request fn GETPWNAM
[3] nsswitch/winbindd_user.c:winbindd_getpwnam(113)
  [ 5263]: getpwnam TST+testuser1
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(343)
  refresh_sequence_number: TST time ok
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(368)
  refresh_sequence_number: TST seq number is now -1
[1] nsswitch/winbindd_user.c:winbindd_getpwnam(148)
  user 'testuser1' does not exist

[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 1300 bytes.
[10] nsswitch/winbindd.c:winbind_client_read(457)
  client_read: read 1568 bytes. Need 0 more for a full request.
[10] nsswitch/winbindd.c:process_request(307)
  process_request: request fn GETPWNAM
[3] nsswitch/winbindd_user.c:winbindd_getpwnam(113)
  [ 5263]: getpwnam TST+TESTUSER1
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(343)
  refresh_sequence_number: TST time ok
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(368)
  refresh_sequence_number: TST seq number is now -1
[1] nsswitch/winbindd_user.c:winbindd_getpwnam(148)
  user 'TESTUSER1' does not exist

[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 1300 bytes.
[10] nsswitch/winbindd.c:winbind_client_read(457)
  client_read: read 1568 bytes. Need 0 more for a full request.
[10] nsswitch/winbindd.c:process_request(307)
  process_request: request fn GETPWNAM
[3] nsswitch/winbindd_user.c:winbindd_getpwnam(113)
  [ 5263]: getpwnam testuser1
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(343)
  refresh_sequence_number: TST time ok
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(368)
  refresh_sequence_number: TST seq number is now -1
[1] nsswitch/winbindd_user.c:winbindd_getpwnam(148)
  user 'testuser1' does not exist

[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 1300 bytes.
[10] nsswitch/winbindd.c:winbind_client_read(457)
  client_read: read 1568 bytes. Need 0 more for a full request.
[10] nsswitch/winbindd.c:process_request(307)
  process_request: request fn GETPWNAM
[3] nsswitch/winbindd_user.c:winbindd_getpwnam(113)
  [ 5263]: getpwnam TESTUSER1
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(343)
  refresh_sequence_number: TST time ok
[10] nsswitch/winbindd_cache.c:refresh_sequence_number(368)
  refresh_sequence_number: TST seq number is now -1
[1] nsswitch/winbindd_user.c:winbindd_getpwnam(148)
  user 'TESTUSER1' does not exist

[10] nsswitch/winbindd.c:client_write(511)
  client_write: wrote 1300 bytes.
[10] nsswitch/winbindd.c:winbind_client_read(457)
  client_read: read 1568 bytes. Need 0 more for a full request.
[10] nsswitch/winbindd.c:process_request(307)
  process_request: request fn CREATE_USER
[3] nsswitch/winbindd_acct.c:winbindd_create_user(876)
  [ 5263]: create_user: user=>(testuser1), group=>()
[5] nsswitch/winbindd_acct.c:wb_getgrnam(522)
  wb_getgrnam: Did not find group (nobody)
[10] nsswitch/winbindd_acct.c:passwd2string(158)
  passwd2string: converting passwd struct for testuser1
[10] nsswitch/winbindd_acct.c:wb_storepwnam(487)
  wb_storepwnam: Success -> "testuser1:x:11144:60001:testuser1:/home1/testuser1:/bin/csh"


Big thanks for all inputs,

John


More information about the samba mailing list