[Samba] intermittent pam_winbind authentication failure

James Dingwall james-samba at dingwall.me.uk
Wed Jun 11 12:52:55 UTC 2025


Hi Rowland,

Thank you for taking the time to reproduce on your system.  I've added some
answers to your questions inline and given some details about the current
state of my testing.

> Date: Tue, 3 Jun 2025 13:46:23 +0100
> From: Rowland Penny <rpenny at samba.org>
> To: samba at lists.samba.org
> Subject: Re: [Samba] intermittent pam_winbind authentication failure
> 
> On Tue, 3 Jun 2025 11:10:54 +0100
> James Dingwall via samba <samba at lists.samba.org> wrote:
> 
> > Hi,
> > 
> > We've been having an intermittent issue with pam_winbind on Ubuntu
> > 24.04. The test case we have to demonstrate this is to run this loop
> > after logging in as a domain user:
> > 
> > $ while sleep 1 ; do sudo -k ; sudo -K ; date ; echo "password" |
> > sudo -S /bin/echo "sudo success" || break ; done
> > 
> > The loop will run successfully, sometimes for 50+ iterations but
> > eventually
> > 
> > [sudo] password for DOMAIN\user: sudo success
> > ...
> > [sudo] password for DOMAIN\user: sudo success
> > [sudo] password for DOMAIN\user: Sorry, try again.
> > [sudo] password for DOMAIN\user:
> > sudo: no password was provided
> > sudo: 1 incorrect password attempt
> > 
> > Rejection of the correct password can also happen with a console
> > logon or remote ssh.  The system is joined to a Windows 2012 based
> > domain.  Ubuntu 24.04 ships with packages based on 4.19.5 but
> > rebuilding the packages from 25.04 (Samba 4.21.4) also has the same
> > issue.
> > 
> > If I try to run winbindd with -d 3 or higher or with strace attached
> > I'm unable to reproduce the issue which makes me suspect a timing
> > issue passing the response between winbindd and pam_winbind.  At -d 2
> > the winbindd logs associated with the failure:
> > 
> > May 22 09:14:36 hostname winbindd[853342]: ads_krb5_mk_req:
> > smb_krb5_get_credentials failed for S00099-HOST$@DOMAIN.COM
> > (Preauthentication failed) May 22 09:14:36 hostname winbindd[853342]:
> > failed to get ticket for S00099-HOST$@DOMAIN.COM: Preauthentication
> > failed May 22 09:14:36 hostname winbindd[853342]: _wbint_PamAuth:
> > Plain-text authentication for user DOMAIN\user returned
> > NT_STATUS_LOGON_FAILURE (PAM: 7) May 22 09:14:36 hostname
> > winbindd[853342]: Auth: [winbind,PAM_AUTH, PAM_WINBIND[sudo], 871758]
> > user [DOMAIN]\[user] at [Thu, 22 May 2025 09:14:36.413942 BST] with
> > [Plaintext] status [NT_STATUS_LOGON_FAILURE] workstation [(null)]
> > remote host [unix:] mapped to [(null)]\[(null)]. local host [unix:]
> > May 22 09:14:36 hostname winbindd[853342]: {"timestamp":
> > "2025-05-22T09:14:36.424126+0100", "type": "Authentication",
> > "Authentication": {"version": {"major": 1, "minor": 3}, "eventId":
> > 4625, "logonId": "db5db55c0b2b6903", "logonType": 8, "status":
> > "NT_STATUS_LOGON_FAILURE", "localAddress": "unix:", "remoteAddress":
> > "unix:", "serviceDescription": "winbind", "authDescription":
> > "PAM_AUTH, PAM_WINBIND[sudo], 871758", "clientDomain": "DOMAIN",
> > "clientAccount": "user", "workstation": null, "becameAccount": "",
> > "becameDomain": "", "becameSid": null, "mappedAccount": null,
> > "mappedDomain": null, "netlogonComputer": null,
> > "netlogonTrustAccount": null, "netlogonNegotiateFlags": "0x00000000",
> > "netlogonSecureChannelType": 0, "netlogonTrustAccountSid": null,
> > "passwordType": "Plaintext", "clientPolicyAccessCheck": null,
> > "serverPolicyAccessCheck": null, "duration": 98176}}
> > 
> > Occasionally for success I get:
> > 
> > Jun 03 08:27:44 hostname winbindd[774711]: final write to client
> > failed: Broken pipe
> > 
> > In the domain controller event log I get a pair of 4768 / 4769 event
> > ids for the proceeding success cases and the failure case (clocks are
> > in sync so I'm reasonably confident I've matched these up correctly)
> > so it seems winbindd has had a successful exchange with the domain
> > controller.
> > 
> > A similar loop to the test case using `echo "pasword" | wblogin
> > --pam-logon="${USER}"` runs reliably.  However I don't see anything
> > in the winbind logs so I'm not sure that "Attempt to authenticate a
> > user in the same way pam_winbind would do." is identical in the
> > technical implementation.
> > 
> > Are there any alternative approaches I could take to try and uncover
> > what is happening?
> > 
> > Thanks,
> > James
> > 
> > 
> > /etc/pam.d/common-auth includes:
> > 
> > auth    [success=ignore default=die]    pam_faillock.so preauth
> > deny=6 unlock_time=1800 silent auth    [success=ok default=1]
> >  pam_localuser.so auth    [success=3 default=ignore]      pam_unix.so
> > try_first_pass auth    [success=2 default=ignore]      pam_winbind.so
> > krb5_auth krb5_ccache_type=FILE cached_login try_first_pass debug
> > auth    optional        pam_faillock.so authfail deny=6
> > unlock_time=1800 auth    requisite                       pam_deny.so
> > 
> > 
> > The smb.conf we're using:
> > 
> > [global]
> >   workgroup = DOMAIN
> >   realm = DOMAIN.COM
> >   netbios name = S00099-host
> >   security = ads
> >   server role = member server
> >   dedicated keytab file = /etc/krb5.keytab
> >   kerberos method = secrets and keytab
> >   allow trusted domains = no
> >   server string = %h server (Samba, Ubuntu)
> >   disable netbios = yes
> >   password server = *
> >   winbind enum groups = yes
> >   winbind enum users = yes
> >   winbind nested groups = yes
> >   winbind refresh tickets = no
> >   template shell = /bin/bash
> >   template homedir = /home/local/%D/%U
> >   idmap config * : backend              = autorid
> >   idmap config * : range                = 2900000001-3000000000
> >   idmap config * : ignore builtin       = yes
> >   idmap config DOMAIN : backend   = rid
> >   idmap config DOMAIN : range     = 3000000001-3100000000
> >   map to guest = bad user
> >   guest account = nobody
> >   log file = /var/log/samba/log.%m
> >   log level = 1
> >   max log size = 5000
> >    load printers = no
> >    printing = bsd
> >    printcap name = /dev/null
> >    disable spoolss = yes
> >    dns proxy = no
> >    wins support = no
> >    domain master = no
> >    local master = no
> >    preferred master = no
> >    store dos attributes = yes
> >    map hidden = no
> >    map readonly = no
> >    map system = no
> >    map archive = no
> >    hide dot files = no
> >    enable core files = yes
> >    min receivefile size = 131072
> >    aio read size = 1
> >    aio write size = 1
> >    use sendfile = yes
> >    unix charset = UTF8
> >    ea support = yes
> >    map acl inherit = yes
> >    acl map full control = no
> >    unix extensions = no
> >    inherit acls = no
> >    follow symlinks = yes
> >    wide links = yes
> > 
> 
> OK, I took your one line script and turned it into this:
> 
> #!/bin/bash
> 
> password=$1
> 
> counter=0
> while sleep 1 ; do
> 	echo "TRY: $counter"
> 	sudo -k
> 	sudo -K
> 	date
> 	echo "$password" | sudo -S /bin/echo "sudo success" || break
> 	counter=$((counter+1))
> done
> 
> I then ran it and it is still running, 'TRY' is now in excess of 2,500
> and still going up.
> 
> I feel your problems possibly lie in your /etc/pam.d/common-auth file
> and rather strange (well, strange to me) idmap config lines in smb.conf.
> 
> My common-auth contains this:
> 
> auth	[success=2 default=ignore]	pam_unix.so nullok
> auth	[success=1 default=ignore]	pam_winbind.so krb5_auth krb5_ccache_type=FILE cached_login try_first_pass
> auth	requisite			pam_deny.so
> auth	required			pam_permit.so
> 
> What do expect your extra/different lines to do ? 

We've added pam_faillock to have consistent password lockout behaviour for
a system regardless of any domain controller policies or whether the user
is local or domain.  The pam_localuser line is to skip pam_unix if the user
is not present in /etc/passwd.  pam_unix was complaining for non-local
accounts and when we were ingesting logs from *lots* of systems to a central
processor this was very noisy so skipping over prevents that.

> 
> You appear to be using both the 'autorid' and 'rid' idmap backends and
> you are telling autorid to ignore the BUILTIN domain, so what is mapping
> the required BUILTIN users and groups, I haven't a clue. I suggest you
> change 'autorid' to 'tdb'. I also have to ask, why are you using such
> high numbers for the ranges ?
> 
> These are my idmap config lines:
> 
>   idmap config * : backend = tdb
>   idmap config * : range = 3000-7999
>   idmap config SAMDOM : backend  = rid
>   idmap config SAMDOM : range = 10000-999999
> 
> By the way, 'TRY' is now approaching 3,000 and still going.

We didn't (obviously) have a need for BUILTIN for just winbind, there are
no shares or printing etc.  We felt that the idmap_autorid was more suitable
for our environment as it does not require a database which needs to be
managed.  The ranges are large to reduce the chances of a collision in the
autorid scheme.

The current state is that I have found the code responsible for generating
the "(Preauthentication failed)" error, from third_party/heimdal/lib/krb5/fast.c:

krb5_error_code
_krb5_fast_unwrap_kdc_rep(krb5_context context, int32_t nonce,
                          krb5_data *chksumdata,
                          struct krb5_fast_state *state, AS_REP *rep)
{
...
    if (nonce != (int32_t)fastrep.nonce) {
        ret = KRB5KDC_ERR_PREAUTH_FAILED;
        goto out;
    }
...
}

I added a debug statement there and for one failure:

2025-06-11T10:22:39 _krb5_fast_unwrap_kdc_rep: ret = KRB5KDC_ERR_PREAUTH_FAILED (3) (-4476732 != 12300484)

Obviously the numbers aren't the same but printing the binary representation
its almost a perfect bit flip (I think this is two's complement representation
because I used %d)

>>> "{0:b}".format(12300484)
'101110111011000011000100'
>>> "{0:b}".format(-4476732)
'-10001000100111100111100'

For another occurence printing the nonce as %lu:


2025-06-11T12:59:10 get_cred_kdc: generated nonce: 4286841863
2025-06-11T12:59:10 _krb5_fast_unwrap_kdc_rep: ret = KRB5KDC_ERR_PREAUTH_FAILED (3) (4286841863 != 8651783)

>>> "{0:b}".format(4286841863)
'11111111100001000000010000000111'
>>> "{0:b}".format(8651783)
        '100001000000010000000111'

The left hand side is definitely the generated value so perhaps an issue
sending the request or unpacking the response?  (If I understood it the value
is sent to the domain controller and then returned in the reply.)

Thanks,
James



More information about the samba mailing list