winbind cache issue for NDR entries

Shilpa K shilpa.krishnareddy at gmail.com
Mon Feb 1 05:36:21 UTC 2021


  Hi Andreas,

Thanks for the response. If it helps, I used below sequence of events to
verify the fix

1. block trusted domain DC IP
2. kill winbindd
3. Try mapping share using trusted domain user credentials. At this point,
it appears like the domain sequence number became -1 and the NDR sequence
value for the trusted domain DC user was -1
4. unblock trusted domain DC IP
 5. Try mapping share using trusted domain user credentials and this
continuously fails for 30minutes because of below code:

        if (!is_domain_offline(domain)) {
                uint32_t entry_seqnum, dom_seqnum, last_check;
                uint64_t entry_timeout;

                if (!wcache_fetch_seqnum(domain->name, &dom_seqnum,
                                         &last_check)) {
                        goto fail;
                }
                entry_seqnum = IVAL(data.dptr, 0);
                if (entry_seqnum != dom_seqnum) {
                        DEBUG(10, ("Entry has wrong sequence number: %d\n",
                                   (int)entry_seqnum));
                        goto fail;
                }
                entry_timeout = BVAL(data.dptr, 4);
                if (time(NULL) > entry_timeout) {
                        DEBUG(10, ("Entry has timed out\n"));
                        goto fail;
                }
        }

The entry_seqnum and dom_seqnum were both -1 (DOM_SEQUENCE_NONE) and so the
data was returned from cache and the NDR call to child processes was not
made.

Thanks,
Shilpa

On Sun, Jan 31, 2021 at 1:18 AM Andreas Schneider <asn at samba.org> wrote:

> On Friday, 29 January 2021 22:05:11 CET Jeremy Allison via samba-technical
> wrote:
> > On Fri, Jan 29, 2021 at 07:39:40PM +0530, Shilpa K via samba-technical
> wrote:
> > >Hello,
> > >
> > >We had a customer report that the users were not able to login for about
> > >30minutes and the problem cleared itself in almost about 30minutes. They
> > >are using Samba as a member server in a domain which has 2 way trust
> with
> > >another domain (say ABC.COM). Upon investigation, we found that there
> was a
> > >problem with trusted domain DCs for a very short duration as per the
> event
> > >log on the DC of the primary domain. This problem seems to have been
> > >cleared away after a short duration. Around the same time, a user
> belonging
> > >to a trusted domain mapped Samba share and encountered a problem. At
> this
> > >time, looks like NDR cache entry for trusted domain group "Domain Users"
> > >was added in winbindd_cache.tdb to indicate that there was a lookup
> problem
> > >and the status NT_STATUS_TRUSTED_DOMAIN_FAILURE was stored as part of
> this
> > >entry. Once the issue with trusted domain DC was cleared and the domain
> was
> > >back online, when users tried to login, PAM_AUTH was successful for the
> > >users but getpwnam failed while looking up SID for "Domain Users". This
> > >failure was returned from the entry in the winbindd_cache.tdb as
> > >wcache_fetch_ndr() succeeded for this entry. Due to this, users
> belonging
> > >to the trusted domain were not able to login. Once the cache was
> expired,
> > >getpwnam succeeded for trusted domain users and the shares could be
> mapped.
> > >In order to resolve this issue, should we not refresh the sequence
> number
> > >when the domain goes online? Btw, we are using "winbind cache time =
> 1800".
> >
> > Yep, looks like we should add a call to force a refresh of the
> > sequence number in the cache here:
> >
> > source3/winbindd/winbindd_cm.c:set_domain_online()
> >
> >   538
> >   539         domain->online = True;
> >   540
> >
> > Add a force_refresh_domain_sequence_number(domain) call above.
> >
> > Here is a (raw, untested) patch that implements this.
> >
> > Any chance you can test this for me ?
> >
> > Jeremy.
>
> I wonder if this is the dc-connect issue with trusted domains.
>
> A fix for this we are currently using is:
>
> https://gitlab.com/samba-redhat/samba/-/commit/
> 87bdffab6eae644d468f0fdc4489667fc21ac3a6
>
> This is just a hack as the right fix would be to completely get rid of the
> dc-
> connect child. However the winbind parent needs the dc-connect just to
> refresh
> the secquence number.
>
> Isaac started to investigate this further and just had a draft for this
> which
> was never finished. We really need to fix this correctly.
>
> https://gitlab.com/samba-team/samba/-/merge_requests/1573
>
>
>
>         Andreas
>
>
>


More information about the samba-technical mailing list