[Samba] winbind craps out, NT_STATUS_PIPE_BROKEN

Matthew Baker matt.baker at bristol.ac.uk
Mon Jan 30 00:59:58 MST 2012


Hi Jay,

On 30/01/2012 01:03, Jay Sullivan wrote:
> I see a tiny correlation when our (Winodws) domain controllers
> reboot.  After patch MS patch Tuesday, I'm guaranteed at least one
> winbind failure when the DC that I'm presently connected to reboots.
> In my kerb config, I'm using a kdc address that round-robins to all
> of our DCs.  When the DC reboots, it's taken out of the rotation, so
> that shouldn't cause any connection loss, right?  Sometime next week
> we won't have any more 2003 domain controllers--all will be replaced
> with 2008.  Maybe this will "solve" my problem?

This sounds near exactly like our config. KDC setup in /etc/krb5.conf 
exactly the same. We're running (mostly?) 2008r2 DCs and nope it doesn't 
look like it's solved. I have for a long time suspected that the reboots 
are the cause. At one point we had a static list of DCs and when the 
first one went down we had to restart. It does seem that samba doesn't 
reconnect to the 2nd in the list when the first disappears.

> At the height of my issue, I was seeing winbind problems every 2
> hours or so.  This was on Debian 5 with Samba 3.4.latest.  I've since
> moved to RHEL 6 and Samba 3.5.10.blah.  Since moving to RHEL/Samba
> 3.5, I've experienced significantly less problems with winbind, maybe
> a few times a week (that I've detected).  At the same time, some of
> our oldest 2003 domain controllers were retired, so this could be a
> case of correlation != causation.

We're running a mix of Debian Lenny and Squeeze. Squeeze almost seems 
worst but I think that's just a perception as these services are more 
frequently used.

> The symptoms are the same as Matthew.  When I try 'getent
> usernamethatisnotincache', I get nothing.  Cached users are fine.
> Similar results with 'id'.  Restarting winbind "fixes" it.
>
> I started logging a bunch of stuff when my script picked up a winbind
> failure.  Sometimes, but not always, there would be several extra
> winbindd processes running.  I usually have 8 winbindd processes (we
> have a few trusted domains, it seems that increases the number of
> winbindd processes) running, but a snapshot of 'ps' before I
> restarted winbind would show maybe 10 or 12 winbindd processes.

That sounds familiar.

> I also cranked up the log level for a while, but my untrained eye
> couldn't seem to make any correlation to a specific event before
> non-cached winbind lookups started to fail.

It might be worth checking the event logs of the DCs for correlating the 
reboots to failures (or when the log entries start appearing). We have a 
separate group of people maintaining the Windows environment so I'll ask 
them for info.

Thanks very much for your comments. =]

Matt

> -----Original Message----- From: Matthew Baker
> [mailto:matt.baker at bristol.ac.uk] Sent: Sunday, January 29, 2012 6:21
> PM To: Jay Sullivan; samba at lists.samba.org Subject: Re: winbind craps
> out, NT_STATUS_PIPE_BROKEN
>
> Hi Jay,
>
> thanks for your comments on your workaround. I too come from an
> environment where there are 1000s of users to pick from who're
> unlikely to login. I found that using the command "getent passwd
> username" just came back empty when the aforementioned error shows in
> the log. I don't suppose you've noticed a point in time when the pipe
> "breaks"? I would be interested to find what causes the break, a
> change in AD or the server running winbind? If we could detect the
> break then we might be closer to the root cause.
>
> Many thanks,
>
> Matt
>
>
> On 26/01/2012 17:17, Jay Sullivan wrote:
>> I'm not going to show you my code because everyone will make fun
>> of me.  But here is the 10 second version:
>>
>> I'm checking on the results of the `id` command from an array of
>> usernames that don't frequently connect to my samba box.  Most
>> users in our AD are members of dozens or hundreds of groups, so I
>> simply check on the length of the output from `id` and decide on
>> whether or not to restart winbind.  The output will typically be
>> empty when winbind is down, but it'll occasionally report just a
>> few groups instead of the usual hundreds.  Why an array of
>> infrequent users? I've found that once I do `id username1`, that
>> user will be stuck in the winbind cache for a while and won't help
>> me figure out if winbind is broken.  Since I have the luxury(?) of
>> thousands of users in our AD that will (probably) never connect to
>> my samba box, I picked a sample and ran with it.  It works _most_
>> of the time, but it's not a solution.  I'm good at band aids, but
>> suck at surgery.  =(
>>
>> Please forward this to the samba mailing list for me.  I just got
>> a bounce from my mail server and it'll take some time to sort out:
>> "Your e-mail service was detected by mx.selfip.biz (NiX Spam) as
>> spamming".  Blacklisting is a necessary evil, I suppose...
>>
>> ~Jay
>>
>> -----Original Message----- From: Matthew Baker
>> [mailto:matt.baker at bristol.ac.uk] Sent: Thursday, January 26, 2012
>> 11:41 AM To: Jay Sullivan Cc: samba at lists.samba.org Subject: Re:
>> winbind craps out, NT_STATUS_PIPE_BROKEN
>>
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> Hi Jay,
>>
>> many thanks for your response.
>>
>> I have a similar set of scripts currently they only run wbinfo -t
>> and a script to check net ads testjoin is sane. They don't catch
>> this. I was thinking about processing the log with something like
>> swatch but it's a kludge. I would be interested in seeing your
>> sanity checks if you don't mind?
>>
>> Cheers,
>>
>> Matt
>>
>> On 26/01/12 16:32, Jay Sullivan wrote:
>>> I am still experiencing this problem.  I've scripted out some
>>> winbind sanity checks that catch when it poops out and restart
>>> winbind automagically.
>>>
>>> I recently migrated our biggest samba host from Debian 5 to RHEL
>>> 6. The problem persists, albeit slightly less frequently (not
>>> very scientific, I know...).
>>>
>>> I typically only have problems with winbind when there are>
>>> 200 users connected _or_>   500 open files as reported by
>>> smbstatus. Unfortunately for me, these conditions describe a
>>> typical samba load during off-peak hours.  =(
>>>
>>> ~Jay
>>>
>>> -- Jay Sullivan Rochester Institute of Technology College of
>>> Imaging Arts and Sciences jay.sullivan at rit.edu
>>>
>>>
>>>
>>> -----Original Message----- From: Matthew Baker
>>> [mailto:matt.baker at bristol.ac.uk] Sent: Tuesday, January 24,
>>> 2012 3:34 AM To: Jay Sullivan; samba at lists.samba.org Subject:
>>> Re: winbind craps out, NT_STATUS_PIPE_BROKEN
>>>
>>> Hi Jay/Samba peeps,
>>>
>>> Emailing in reference to
>>> http://lists.samba.org/archive/samba/2011-April/162277.html
>>>
>>> I have seen a very similar issue with a similar setup.
>>>
>>> Users fail to be verified with:
>>>
>>> getent passwd username
>>>
>>> Entry in the log at same time is:
>>>
>>> [2012/01/23 16:58:53.159761,  3]
>>> winbindd/winbindd_misc.c:352(winbindd_interface_version)
>>> [18510]: request interface version [2012/01/23 16:58:53.159966,
>>> 3] winbindd/winbindd_misc.c:385(winbindd_priv_pipe_dir) [18510]:
>>> request location of privileged pipe [2012/01/23 16:58:53.160214,
>>> 3] winbindd/winbindd_getpwnam.c:55(winbindd_getpwnam_send)
>>> getpwnam username [2012/01/23 16:58:53.162493,  5]
>>> winbindd/winbindd_getpwnam.c:138(winbindd_getpwnam_recv) Could
>>> not convert sid S-1-5-21-1117850145-1682116191-196506527-126617:
>>> NT_STATUS_PIPE_BROKEN
>>>
>>> Restarting winbindd solves the problem temporarily.
>>>
>>> I've attached a copy of the smb.conf.
>>>
>>> OS:     Debian Squeeze 6.0.3 Kernel: 2.6.32-5-686-bigmem samba
>>> 2:3.5.6~dfsg-3squeeze5 winbind 2:3.5.6~dfsg-3squeeze5
>>>
>>> Jay did you find a solution to your problem? Has anyone else on
>>> the list seen similar issues or have any ideas of what might be
>>> happening?
>>>
>>> Any advice or pointers would be very much appreciated.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>
>>
>> - --
>>
>> Matthew Baker :: Senior Systems Administrator :: University of
>> Bristol
>> +----------------------------------------------------------------------+
>>
>>
>
>>
| Infrastructure, Systems and Operations  it-sysops at bristol.ac.uk      |
>> | T: Berkeley Square:  +44(0)117 3314325  (Mon, Thur&   Fri) | | T:
>> Computer Centre:  +44(0)117 3317467  (Tue, Wed) | | A: Uni of
>> Bristol, Computer Centre, Tyndall Ave, Bristol. BS81UD |
>> +----------------------------------------------------------------------+
>>
>>
>
>>
-----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora
>> - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAk8hggMACgkQLvm7pB/aicMZyACfYGhlYW/Xd2ULgMPdp4K5oL7b
>> 8noAnAz4VjjvHEb/cuhbOj+97Rxc9bJ2 =uAtp -----END PGP SIGNATURE-----
>
>


-- 

  Matthew Baker :: Senior Systems Administrator :: University of Bristol
+----------------------------------------------------------------------+
| Infrastructure, Systems and Operations  it-sysops at bristol.ac.uk      |
| T: Berkeley Square:  +44(0)117 3314325  (Mon, Thur & Fri)            |
| T: Computer Centre:  +44(0)117 3317467  (Tue, Wed)                   |
| A: Uni of Bristol, Computer Centre, Tyndal Ave, Bristol. BS81UD      |
+----------------------------------------------------------------------+


More information about the samba mailing list