[Samba] Corruption of winbind cache after converting NT4 to AD domain

Michael Tokarev mjt at tls.msk.ru
Fri Feb 11 21:34:50 UTC 2022


Hi!

We've been using NT4 domain with samba for many years (more than a decade for sure),
quite successfully.  And instead of fighting with it every time, we finally decided
to convert it to AD.  And with that, we faced numerous quite bad issues, so that
our network isn't working right for over a week already. Here's one of the issues
(more to follow).

I created a new machine for the DC, parallel to the fileserver which was everything
at once.  Copied all configuration and data to it, and did classicupgrade there.
Which worked fine after several attempts (we had to fix some issues, that's ok).

The main fileserver - I stopped it, moved everything out, leaving just the share
definitions in conffile, and joined it to the domain (net ads join member). Which
also went fine. and after configuring nsswitch and other stuff, it started working.

And immediately we faced a problem with roaming profiles - at first windows did
everything but after a few logins/logouts it refused to syncronize profile telling
that its owner is wrong - "Unix user mjt" instead of "DOMAIN\mjt".

After long and painful debugging (since there's very little info about how it all
works, which components does what and how it all should be done) it all boiled down
to winbind cache corruption/pollution. Somewhat similar to this one:

   https://lists.samba.org/archive/samba-technical/2019-February/132730.html

except that in our case it is different.

After net cache flush I lookup every uid we have with wblookup --uid-info.
Everything's fine, every uid is looked up fine.  But after some random
time, wbinfo --uid-info start to return DOMAIN_NOT_FOUND errors to one or
two, some more time and the amount of "not found" entries grows and grows.

winbindd_getpwuid_send: [wbinfo (47371)] getpwuid 1068
Opening cache file at /run/samba/gencache.tdb
       wbint_UnixIDs2Sids: struct wbint_UnixIDs2Sids
          in: struct wbint_UnixIDs2Sids
              domain_name              : *
                  domain_name              : 'TSRV'
              domain_sid               : S-1-5-21-489615817-366373558-193322279
              num_ids                  : 0x00000001 (1)
              xids: ARRAY(1)
                  xids: struct unixid
                      id                       : 0x0000042c (1068)
                      type                     : ID_TYPE_UID (1)

(this is request by wbinfo).  I don't know why domain_name is TSRV - TSRV is
the file server (netbios and host name). the domain in question is TLS.MSK.RU
(TLS).  But that's not the issue, it works so far.  idmap_backend is ad, fwiw.

..lots of info...
idmap_ad_unixids_to_sids: Mapped S-1-5-21-411424318-379842365-2075518510-1024 -> 1068 (1)
...
gencache_set_data_blob: Adding cache entry with key=[IDMAP/SID2XID/S-1-5-21-411424318-379842365-2075518510-1024] and timeout=[...] (604800 seconds ahead)
gencache_set_data_blob: Adding cache entry with key=[IDMAP/UID2SID/1068] and timeout=[...] (604800 seconds ahead)
...
Finding user BAY
Trying _Get_Pwnam(), username as lowercase is bay
Get_Pwnam_internals did find user [BAY]!
init_lsa_rids: BAY found
gencache_set_data_blob: Adding cache entry with key=[NAME2SID/\BAY] and timeout=[...] (300 seconds ahead)
gencache_set_data_blob: Adding cache entry with key=[SID2NAME/S-1-22-1-1068] and timeout=[...] (300 seconds ahead)
gencache_set_data_blob: Adding cache entry with key=[NAME2SID/UNIX USER\BAY] and timeout=[...] (300 seconds ahead)
Finished processing child request 56
Writing 4032 bytes to parent
       wbint_LookupName: struct wbint_LookupName
          out: struct wbint_LookupName
              type                     : *
                  type                     : SID_NAME_USER (1)
              sid                      : *
                  sid                      : S-1-22-1-1068
              result                   : NT_STATUS_OK
...
gencache_set_data_blob: Adding cache entry with key=[IDMAP/SID2XID/S-1-22-1-1068] and timeout=[Пт фев 18 01:00:59 2022 MSK] (604800 seconds ahead)
gencache_set_data_blob: Adding cache entry with key=[IDMAP/UID2SID/1068] and timeout=[Пт фев 18 01:00:59 2022 MSK] (604800 seconds ahead)

(here, as far as I can tell, the value it wrote for IDMAP/UID2SID/1068 is S-1-22-1-1068)

Finished processing child request 56
Writing 4040 bytes to parent
       wbint_Sids2UnixIDs: struct wbint_Sids2UnixIDs
          out: struct wbint_Sids2UnixIDs
              ids                      : *
                  ids: struct wbint_TransIDArray
                      num_ids                  : 0x00000001 (1)
                      ids: ARRAY(1)
                          ids: struct wbint_TransID
                              type_hint                : ID_TYPE_NOT_SPECIFIED (0)
                              domain_index             : 0x00000000 (0)
                              rid                      : 0x0000042c (1068)
                              xid: struct unixid
                                  id                       : 0x0000042c (1068)
                                  type                     : ID_TYPE_UID (1)
              result                   : NT_STATUS_OK
gencache_set_data_blob: Adding cache entry with key=[IDMAP/SID2XID/S-1-22-1-1068] and timeout=[Пт фев 18 01:00:59 2022 MSK] (604800 seconds ahead)
gencache_set_data_blob: Adding cache entry with key=[IDMAP/UID2SID/1068] and timeout=[Пт фев 18 01:00:59 2022 MSK] (604800 seconds ahead)
...

and next it starts to return errors:

Could not convert sid S-1-22-1-1068: NT_STATUS_NO_SUCH_USER
process_request_done: [nss_winbind(47509):GETGROUPS]: NT_STATUS_NO_SUCH_USER

winbindd_getpwuid_send: [wbinfo (47516)] getpwuid 1068
wb_xids2sids_send: Found UID in cache: S-1-22-1-1068
Could not convert sid S-1-22-1-1068: NT_STATUS_INVALID_PARAMETER
process_request_done: [wbinfo(47516):GETPWUID]: NT_STATUS_INVALID_PARAMETER
process_request_written: [wbinfo(47516):GETPWUID]: delivered response to client

etc.

There are just selected parts of the picture, whole winbind trace file is here:
http://www.corpit.ru/mjt/tmp/winbind.trc

Obviously, from now on, uid 1068 does not work anymore.  Over time, more and more
uids stops working, until next `net cache flush'.


Now, the most "interesting" part, besides the obvious wrong behavour somewhere.

For a long time, we had unix users with their own regular home directories,
shell access and lots of work in linux.  As far as I can see, in order to
use AD domain, we should convert linux users to AD, so that a user is EITHER
in linux OR in AD, but not both.  I found nothing conclusive about this, it
is just my gut feeling, - there's no direct requirement like this in the docs
I found so far.  But I see that people do it like this, not mixing uids and
usernames.  It is just my gut feeling maybe I'm wrong..

So there are two parts of the question:

First, how such setup should be done? We really used to linux auth and linux
work, it's somewhat unnatural to rely on the AD when dealing with local linux
accounts.  But at the same time, these account should have access from windows
to their files.  And most important, _why_ this setup should be done?

And second, what to do with this cache corruption, how to prevent it? Is it
possible to perform AD auth by samba AND linux auth when logging in to the linux
machine?  Adding --no-cache to winbind command line helped, but this obviously
is not a good solution...

System info:

samba 4.13.13+dfsg-1~deb11u2 on debian bullseye, current.

smb.conf:
[global]
   server string = %h samba server %v
   netbios name = TSRV
   netbios aliases = LINUX FS
   realm = TLS.MSK.RU
   workgroup = TLS
   server role = member server
   security = ADS

   idmap config TLS : backend = ad
   idmap config TLS : range = 1000-3000
   idmap config TLS : schema_mode = rfc2307
   idmap config TLS : unix_primary_group = yes
   template homedir = /home/%U
   idmap config * : backend = tdb
   idmap config * : range = 5000-7000

...share definitions...

Thank you for the time! It turned out to be quite a bit longer than I expected...

/mjt



More information about the samba mailing list