[Samba] Homes shares randomly dissapear on AD-DC'S

Achim Gottinger achim at ag-web.biz
Wed Jul 23 03:24:22 MDT 2014


Am 23.07.2014 11:22, schrieb Achim Gottinger:
> Am 23.07.2014 10:46, schrieb Achim Gottinger:
>> Am 15.07.2014 09:18, schrieb Achim Gottinger:
>>> Am 10.07.2014 12:13, schrieb Achim Gottinger:
>>>> Am 09.07.2014 12:58, schrieb Achim Gottinger:
>>>>> Am 09.07.2014 11:29, schrieb Achim Gottinger:
>>>>>> Am 09.07.2014 11:08, schrieb Jonathan Buzzard:
>>>>>>> On Wed, 2014-07-09 at 10:42 +0200, Achim Gottinger wrote:
>>>>>>>
>>>>>>> [SNIP]
>>>>>>>
>>>>>>>>   I use unscd for caching, restarted it but it did not help.
>>>>>>> I take it that you missed the big warnings not to use nscd in
>>>>>>> combination with winbind? You are aware that winbind does it's own
>>>>>>> caching?
>>>>>>>
>>>>>>> I would suggest your first port of call is to disable unscd and 
>>>>>>> see if
>>>>>>> the problem goes away.
>>>>>>>
>>>>>>> JAB.
>>>>>>>
>>>>>> Thank you for the tip, disabled it at all four locations. I used 
>>>>>> unscd also on the main site which always ran rock solid.
>>>>>>
>>>>>> Restarting samba on the branches witch winbind/nss issues fixed 
>>>>>> wbinfo/getent passwd tests for a few minutes but now they do not 
>>>>>> resolve again. Gotta watch it with unscd disabled now.
>>>>>> Thinking about downgrading tp 4.1.4 which had had the issues but 
>>>>>> they appeared only once a week and not every few hours.
>>>>>>
>>>>>> achim~
>>>>>>
>>>>> Had to restart samba a few more times meanwhile. Was able to make 
>>>>> it fail running wbinfo -u a few times. Since they servers are all 
>>>>> vm's with 1GB in the branches i increased the moemory to 3Gb and 
>>>>> since then i was not able to make samba fail with wbinfo -u. Hope 
>>>>> that did the trick.
>>>>>
>>>> So far no more [homes] drop outs with 3GB memory assigned. Also 
>>>> wbinfo -u getent passwd work flawless. Skimming thru saved log 
>>>> files from yesterday trying to find anything memory related but i 
>>>> can not find anything. Also there are no sings like OOM kills in 
>>>> syslog at that timeframe.
>>>> The vm's had 4GB swap space assigned which had shown usage in few 
>>>> MB range.
>>>> Would have expected slow down's in speed due to swapping but no 
>>>> silent dropping of shares if an server runs out of memory.
>>>>
>>>> achim
>>> After it worked on Fr, Sa and Monday, this morning they dissapeared 
>>> at our main site for the first time. This vm has 6GB memory and 4 
>>> cpu cores assigned and it is the first time the [homes] share 
>>> stopped working. Even after restarting samba wbinfo -u und wbinfo -g 
>>> takes sometimes up to 30 seconds to enumerate users/groups.
>>>
>>> achim~
>>>
>> So far the issue reappeared on our main site last friday at around 
>> 9am and again multiple times today since 9:15am. It did not appear on 
>> the branches since i increased memory to 3gb.
>> People start calling that their home directories are not accessible 
>> any longer. Not all accounts seem to be affected and others can 
>> continue to work for an while.
>>
>> wbinfo -u reports "Error looking up domain users".
>>
>> Reloading samba services does not help i have to restart them. It's 
>> difficult to track down the issue the server is in production and 
>> must get back into an working state asap.
>>
>> Also i noticed wbinfo -u sometimes takes an long time to report 
>> results. This is an snippet of an strace, showing an few timeouts 
>> trying to access /var/run/samba/winbindd/pipe.
>>
>> Any suggestions how i can track the issues down are welcome.
>>
>> Thanks in advance,
>> achim~
>>
>> connect(3, {sa_family=AF_FILE, path="/var/run/samba/winbindd/pipe"}, 
>> 110) = 0
>>
>> poll([{fd=3, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=3, 
>> revents=POLLOUT}])
>>
>> write(3, 
>> "0\10\0\0\0\0\0\0\0\0\0\0\306|\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 2096) = 2096
>>
>> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
>> revents=POLLIN}])
>>
>> read(3, 
>> "\250\r\0\0\2\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 3496) = 3496
>>
>> poll([{fd=3, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=3, 
>> revents=POLLOUT}])
>>
>> write(3, 
>> "0\10\0\0/\0\0\0\0\0\0\0\306|\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 2096) = 2096
>>
>> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
>> revents=POLLIN}])
>>
>> read(3, 
>> "\313\r\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 3496) = 3496
>>
>> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
>> revents=POLLIN}])
>>
>> read(3, "/var/lib/samba/winbindd_privileg"..., 35) = 35
>>
>> lstat("/var/lib/samba/winbindd_privileged", {st_mode=S_IFDIR|0750, 
>> st_size=4096, ...}) = 0
>>
>> lstat("/var/lib/samba/winbindd_privileged/pipe", 
>> {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>
>> socket(PF_FILE, SOCK_STREAM, 0)         = 4
>>
>> fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
>>
>> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>>
>> fcntl(4, F_GETFD)                       = 0
>>
>> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
>>
>> connect(4, {sa_family=AF_FILE, 
>> path="/var/lib/samba/winbindd_privileged/pipe"}, 110) = 0
>>
>> close(3)                                = 0
>>
>> poll([{fd=4, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=4, 
>> revents=POLLOUT}])
>>
>> write(4, 
>> "0\10\0\0\22\0\0\0\0\0\0\0\306|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 2096) = 2096
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=4, 
>> revents=POLLIN}])
>>
>> read(4, 
>> "\24\20\0\0\2\0\0\0\236\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>> 3496) = 3496
>>
>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=4, 
>> revents=POLLIN}])
>>
>>
> May I ask list members to do an quick test and post the results of
> time "wbinfo -u"
It's
time wbinfo -u
without the quotes
>
> On this network it often takes up to 30 seconds an short time (<5secs) 
> later 1-2 seconds but soon afterwards it are 30 seconds again. This 
> domain has around 200 user accounds and around 50 clients.
>
> On another network with 50 users and 30 clients there is no delay 
> calling "wbinfo -u"
>
> achim~
>



More information about the samba mailing list