[Samba] intermittent pam_winbind authentication failure
Rowland Penny
rpenny at samba.org
Wed Jun 11 13:22:54 UTC 2025
On Wed, 11 Jun 2025 13:52:55 +0100
James Dingwall via samba <samba at lists.samba.org> wrote:
> Hi Rowland,
>
> Thank you for taking the time to reproduce on your system. I've
> added some answers to your questions inline and given some details
> about the current state of my testing.
>
> > Date: Tue, 3 Jun 2025 13:46:23 +0100
> > From: Rowland Penny <rpenny at samba.org>
> > To: samba at lists.samba.org
> > Subject: Re: [Samba] intermittent pam_winbind authentication failure
> >
> > On Tue, 3 Jun 2025 11:10:54 +0100
> > James Dingwall via samba <samba at lists.samba.org> wrote:
> >
> > > Hi,
> > >
> > > We've been having an intermittent issue with pam_winbind on Ubuntu
> > > 24.04. The test case we have to demonstrate this is to run this
> > > loop after logging in as a domain user:
> > >
> > > $ while sleep 1 ; do sudo -k ; sudo -K ; date ; echo "password" |
> > > sudo -S /bin/echo "sudo success" || break ; done
> > >
> > > The loop will run successfully, sometimes for 50+ iterations but
> > > eventually
> > >
> > > [sudo] password for DOMAIN\user: sudo success
> > > ...
> > > [sudo] password for DOMAIN\user: sudo success
> > > [sudo] password for DOMAIN\user: Sorry, try again.
> > > [sudo] password for DOMAIN\user:
> > > sudo: no password was provided
> > > sudo: 1 incorrect password attempt
> > >
> > > Rejection of the correct password can also happen with a console
> > > logon or remote ssh. The system is joined to a Windows 2012 based
> > > domain. Ubuntu 24.04 ships with packages based on 4.19.5 but
> > > rebuilding the packages from 25.04 (Samba 4.21.4) also has the
> > > same issue.
> > >
> > > If I try to run winbindd with -d 3 or higher or with strace
> > > attached I'm unable to reproduce the issue which makes me suspect
> > > a timing issue passing the response between winbindd and
> > > pam_winbind. At -d 2 the winbindd logs associated with the
> > > failure:
> > >
> > > May 22 09:14:36 hostname winbindd[853342]: ads_krb5_mk_req:
> > > smb_krb5_get_credentials failed for S00099-HOST$@DOMAIN.COM
> > > (Preauthentication failed) May 22 09:14:36 hostname
> > > winbindd[853342]: failed to get ticket for
> > > S00099-HOST$@DOMAIN.COM: Preauthentication failed May 22 09:14:36
> > > hostname winbindd[853342]: _wbint_PamAuth: Plain-text
> > > authentication for user DOMAIN\user returned
> > > NT_STATUS_LOGON_FAILURE (PAM: 7) May 22 09:14:36 hostname
> > > winbindd[853342]: Auth: [winbind,PAM_AUTH, PAM_WINBIND[sudo],
> > > 871758] user [DOMAIN]\[user] at [Thu, 22 May 2025 09:14:36.413942
> > > BST] with [Plaintext] status [NT_STATUS_LOGON_FAILURE]
> > > workstation [(null)] remote host [unix:] mapped to
> > > [(null)]\[(null)]. local host [unix:] May 22 09:14:36 hostname
> > > winbindd[853342]: {"timestamp":
> > > "2025-05-22T09:14:36.424126+0100", "type": "Authentication",
> > > "Authentication": {"version": {"major": 1, "minor": 3},
> > > "eventId": 4625, "logonId": "db5db55c0b2b6903", "logonType": 8,
> > > "status": "NT_STATUS_LOGON_FAILURE", "localAddress": "unix:",
> > > "remoteAddress": "unix:", "serviceDescription": "winbind",
> > > "authDescription": "PAM_AUTH, PAM_WINBIND[sudo], 871758",
> > > "clientDomain": "DOMAIN", "clientAccount": "user", "workstation":
> > > null, "becameAccount": "", "becameDomain": "", "becameSid": null,
> > > "mappedAccount": null, "mappedDomain": null, "netlogonComputer":
> > > null, "netlogonTrustAccount": null, "netlogonNegotiateFlags":
> > > "0x00000000", "netlogonSecureChannelType": 0,
> > > "netlogonTrustAccountSid": null, "passwordType": "Plaintext",
> > > "clientPolicyAccessCheck": null, "serverPolicyAccessCheck": null,
> > > "duration": 98176}}
> > >
> > > Occasionally for success I get:
> > >
> > > Jun 03 08:27:44 hostname winbindd[774711]: final write to client
> > > failed: Broken pipe
> > >
> > > In the domain controller event log I get a pair of 4768 / 4769
> > > event ids for the proceeding success cases and the failure case
> > > (clocks are in sync so I'm reasonably confident I've matched
> > > these up correctly) so it seems winbindd has had a successful
> > > exchange with the domain controller.
> > >
> > > A similar loop to the test case using `echo "pasword" | wblogin
> > > --pam-logon="${USER}"` runs reliably. However I don't see
> > > anything in the winbind logs so I'm not sure that "Attempt to
> > > authenticate a user in the same way pam_winbind would do." is
> > > identical in the technical implementation.
> > >
> > > Are there any alternative approaches I could take to try and
> > > uncover what is happening?
> > >
> > > Thanks,
> > > James
> > >
> > >
> > > /etc/pam.d/common-auth includes:
> > >
> > > auth [success=ignore default=die] pam_faillock.so preauth
> > > deny=6 unlock_time=1800 silent auth [success=ok default=1]
> > > pam_localuser.so auth [success=3 default=ignore]
> > > pam_unix.so try_first_pass auth [success=2 default=ignore]
> > > pam_winbind.so krb5_auth krb5_ccache_type=FILE cached_login
> > > try_first_pass debug auth optional pam_faillock.so
> > > authfail deny=6 unlock_time=1800 auth requisite
> > > pam_deny.so
> > >
> > >
> > > The smb.conf we're using:
> > >
> > > [global]
> > > workgroup = DOMAIN
> > > realm = DOMAIN.COM
> > > netbios name = S00099-host
> > > security = ads
> > > server role = member server
> > > dedicated keytab file = /etc/krb5.keytab
> > > kerberos method = secrets and keytab
> > > allow trusted domains = no
> > > server string = %h server (Samba, Ubuntu)
> > > disable netbios = yes
> > > password server = *
> > > winbind enum groups = yes
> > > winbind enum users = yes
> > > winbind nested groups = yes
> > > winbind refresh tickets = no
> > > template shell = /bin/bash
> > > template homedir = /home/local/%D/%U
> > > idmap config * : backend = autorid
> > > idmap config * : range = 2900000001-3000000000
> > > idmap config * : ignore builtin = yes
> > > idmap config DOMAIN : backend = rid
> > > idmap config DOMAIN : range = 3000000001-3100000000
> > > map to guest = bad user
> > > guest account = nobody
> > > log file = /var/log/samba/log.%m
> > > log level = 1
> > > max log size = 5000
> > > load printers = no
> > > printing = bsd
> > > printcap name = /dev/null
> > > disable spoolss = yes
> > > dns proxy = no
> > > wins support = no
> > > domain master = no
> > > local master = no
> > > preferred master = no
> > > store dos attributes = yes
> > > map hidden = no
> > > map readonly = no
> > > map system = no
> > > map archive = no
> > > hide dot files = no
> > > enable core files = yes
> > > min receivefile size = 131072
> > > aio read size = 1
> > > aio write size = 1
> > > use sendfile = yes
> > > unix charset = UTF8
> > > ea support = yes
> > > map acl inherit = yes
> > > acl map full control = no
> > > unix extensions = no
> > > inherit acls = no
> > > follow symlinks = yes
> > > wide links = yes
> > >
> >
> > OK, I took your one line script and turned it into this:
> >
> > #!/bin/bash
> >
> > password=$1
> >
> > counter=0
> > while sleep 1 ; do
> > echo "TRY: $counter"
> > sudo -k
> > sudo -K
> > date
> > echo "$password" | sudo -S /bin/echo "sudo success" || break
> > counter=$((counter+1))
> > done
> >
> > I then ran it and it is still running, 'TRY' is now in excess of
> > 2,500 and still going up.
> >
> > I feel your problems possibly lie in your /etc/pam.d/common-auth
> > file and rather strange (well, strange to me) idmap config lines in
> > smb.conf.
> >
> > My common-auth contains this:
> >
> > auth [success=2 default=ignore] pam_unix.so nullok
> > auth [success=1 default=ignore] pam_winbind.so
> > krb5_auth krb5_ccache_type=FILE cached_login try_first_pass
> > auth requisite pam_deny.so
> > auth required pam_permit.so
> >
> > What do expect your extra/different lines to do ?
>
> We've added pam_faillock to have consistent password lockout
> behaviour for a system regardless of any domain controller policies
> or whether the user is local or domain. The pam_localuser line is to
> skip pam_unix if the user is not present in /etc/passwd. pam_unix
> was complaining for non-local accounts and when we were ingesting
> logs from *lots* of systems to a central processor this was very
> noisy so skipping over prevents that.
You really need very few 'local users' (by which, I mean a user that
isn't a 'system' user and is in /etc/passwd), I have only one, just in
case a domain connection error occurs, otherwise members of the Domain
Admins group are allowed admin access (use another group if you use the
'ad' idmap backend).
>
> >
> > You appear to be using both the 'autorid' and 'rid' idmap backends
> > and you are telling autorid to ignore the BUILTIN domain, so what
> > is mapping the required BUILTIN users and groups, I haven't a clue.
> > I suggest you change 'autorid' to 'tdb'. I also have to ask, why
> > are you using such high numbers for the ranges ?
> >
> > These are my idmap config lines:
> >
> > idmap config * : backend = tdb
> > idmap config * : range = 3000-7999
> > idmap config SAMDOM : backend = rid
> > idmap config SAMDOM : range = 10000-999999
> >
> > By the way, 'TRY' is now approaching 3,000 and still going.
>
> We didn't (obviously) have a need for BUILTIN for just winbind, there
> are no shares or printing etc. We felt that the idmap_autorid was
> more suitable for our environment as it does not require a database
> which needs to be managed. The ranges are large to reduce the
> chances of a collision in the autorid scheme.
Yes, I can understand using 'autorid', just not with 'rid' at the same
time. I have never set up Samba in the way you have, but I presume that
'rid' is mapping your 'DOMAIN' users and I have no idea if 'autorid' is
doing anything because you have stopped it from mapping the BUILTIN
domain and, yes, you do need the BUILTIN domain.
>
> The current state is that I have found the code responsible for
> generating the "(Preauthentication failed)" error, from
> third_party/heimdal/lib/krb5/fast.c:
>
> krb5_error_code
> _krb5_fast_unwrap_kdc_rep(krb5_context context, int32_t nonce,
> krb5_data *chksumdata,
> struct krb5_fast_state *state, AS_REP *rep)
> {
> ...
> if (nonce != (int32_t)fastrep.nonce) {
> ret = KRB5KDC_ERR_PREAUTH_FAILED;
> goto out;
> }
> ...
> }
>
> I added a debug statement there and for one failure:
>
> 2025-06-11T10:22:39 _krb5_fast_unwrap_kdc_rep: ret =
> KRB5KDC_ERR_PREAUTH_FAILED (3) (-4476732 != 12300484)
>
> Obviously the numbers aren't the same but printing the binary
> representation its almost a perfect bit flip (I think this is two's
> complement representation because I used %d)
>
> >>> "{0:b}".format(12300484)
> '101110111011000011000100'
> >>> "{0:b}".format(-4476732)
> '-10001000100111100111100'
>
> For another occurence printing the nonce as %lu:
>
>
> 2025-06-11T12:59:10 get_cred_kdc: generated nonce: 4286841863
> 2025-06-11T12:59:10 _krb5_fast_unwrap_kdc_rep: ret =
> KRB5KDC_ERR_PREAUTH_FAILED (3) (4286841863 != 8651783)
>
> >>> "{0:b}".format(4286841863)
> '11111111100001000000010000000111'
> >>> "{0:b}".format(8651783)
> '100001000000010000000111'
>
> The left hand side is definitely the generated value so perhaps an
> issue sending the request or unpacking the response? (If I
> understood it the value is sent to the domain controller and then
> returned in the reply.)
>
> Thanks,
> James
>
If it is of any interest, I finally killed my script when it go over
9000, it just wouldn't fail for me.
Rowland
More information about the samba
mailing list