[Samba] Homes shares randomly dissapear on AD-DC'S

Achim Gottinger achim at ag-web.biz
Wed Jul 23 03:22:11 MDT 2014


Am 23.07.2014 10:46, schrieb Achim Gottinger:
> Am 15.07.2014 09:18, schrieb Achim Gottinger:
>> Am 10.07.2014 12:13, schrieb Achim Gottinger:
>>> Am 09.07.2014 12:58, schrieb Achim Gottinger:
>>>> Am 09.07.2014 11:29, schrieb Achim Gottinger:
>>>>> Am 09.07.2014 11:08, schrieb Jonathan Buzzard:
>>>>>> On Wed, 2014-07-09 at 10:42 +0200, Achim Gottinger wrote:
>>>>>>
>>>>>> [SNIP]
>>>>>>
>>>>>>>   I use unscd for caching, restarted it but it did not help.
>>>>>> I take it that you missed the big warnings not to use nscd in
>>>>>> combination with winbind? You are aware that winbind does it's own
>>>>>> caching?
>>>>>>
>>>>>> I would suggest your first port of call is to disable unscd and 
>>>>>> see if
>>>>>> the problem goes away.
>>>>>>
>>>>>> JAB.
>>>>>>
>>>>> Thank you for the tip, disabled it at all four locations. I used 
>>>>> unscd also on the main site which always ran rock solid.
>>>>>
>>>>> Restarting samba on the branches witch winbind/nss issues fixed 
>>>>> wbinfo/getent passwd tests for a few minutes but now they do not 
>>>>> resolve again. Gotta watch it with unscd disabled now.
>>>>> Thinking about downgrading tp 4.1.4 which had had the issues but 
>>>>> they appeared only once a week and not every few hours.
>>>>>
>>>>> achim~
>>>>>
>>>> Had to restart samba a few more times meanwhile. Was able to make 
>>>> it fail running wbinfo -u a few times. Since they servers are all 
>>>> vm's with 1GB in the branches i increased the moemory to 3Gb and 
>>>> since then i was not able to make samba fail with wbinfo -u. Hope 
>>>> that did the trick.
>>>>
>>> So far no more [homes] drop outs with 3GB memory assigned. Also 
>>> wbinfo -u getent passwd work flawless. Skimming thru saved log files 
>>> from yesterday trying to find anything memory related but i can not 
>>> find anything. Also there are no sings like OOM kills in syslog at 
>>> that timeframe.
>>> The vm's had 4GB swap space assigned which had shown usage in few MB 
>>> range.
>>> Would have expected slow down's in speed due to swapping but no 
>>> silent dropping of shares if an server runs out of memory.
>>>
>>> achim
>> After it worked on Fr, Sa and Monday, this morning they dissapeared 
>> at our main site for the first time. This vm has 6GB memory and 4 cpu 
>> cores assigned and it is the first time the [homes] share stopped 
>> working. Even after restarting samba wbinfo -u und wbinfo -g takes 
>> sometimes up to 30 seconds to enumerate users/groups.
>>
>> achim~
>>
> So far the issue reappeared on our main site last friday at around 9am 
> and again multiple times today since 9:15am. It did not appear on the 
> branches since i increased memory to 3gb.
> People start calling that their home directories are not accessible 
> any longer. Not all accounts seem to be affected and others can 
> continue to work for an while.
>
> wbinfo -u reports "Error looking up domain users".
>
> Reloading samba services does not help i have to restart them. It's 
> difficult to track down the issue the server is in production and must 
> get back into an working state asap.
>
> Also i noticed wbinfo -u sometimes takes an long time to report 
> results. This is an snippet of an strace, showing an few timeouts 
> trying to access /var/run/samba/winbindd/pipe.
>
> Any suggestions how i can track the issues down are welcome.
>
> Thanks in advance,
> achim~
>
> connect(3, {sa_family=AF_FILE, path="/var/run/samba/winbindd/pipe"}, 
> 110) = 0
>
> poll([{fd=3, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=3, 
> revents=POLLOUT}])
>
> write(3, 
> "0\10\0\0\0\0\0\0\0\0\0\0\306|\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 2096) = 2096
>
> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
> revents=POLLIN}])
>
> read(3, 
> "\250\r\0\0\2\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 3496) = 3496
>
> poll([{fd=3, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=3, 
> revents=POLLOUT}])
>
> write(3, 
> "0\10\0\0/\0\0\0\0\0\0\0\306|\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2096) 
> = 2096
>
> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
> revents=POLLIN}])
>
> read(3, 
> "\313\r\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 3496) = 3496
>
> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
> revents=POLLIN}])
>
> read(3, "/var/lib/samba/winbindd_privileg"..., 35) = 35
>
> lstat("/var/lib/samba/winbindd_privileged", {st_mode=S_IFDIR|0750, 
> st_size=4096, ...}) = 0
>
> lstat("/var/lib/samba/winbindd_privileged/pipe", 
> {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>
> socket(PF_FILE, SOCK_STREAM, 0)         = 4
>
> fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
>
> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>
> fcntl(4, F_GETFD)                       = 0
>
> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
>
> connect(4, {sa_family=AF_FILE, 
> path="/var/lib/samba/winbindd_privileged/pipe"}, 110) = 0
>
> close(3)                                = 0
>
> poll([{fd=4, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=4, 
> revents=POLLOUT}])
>
> write(4, 
> "0\10\0\0\22\0\0\0\0\0\0\0\306|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 2096) = 2096
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=4, 
> revents=POLLIN}])
>
> read(4, 
> "\24\20\0\0\2\0\0\0\236\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 3496) = 3496
>
> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=4, 
> revents=POLLIN}])
>
>
May I ask list members to do an quick test and post the results of
time "wbinfo -u"

On this network it often takes up to 30 seconds an short time (<5secs) 
later 1-2 seconds but soon afterwards it are 30 seconds again. This 
domain has around 200 user accounds and around 50 clients.

On another network with 50 users and 30 clients there is no delay 
calling "wbinfo -u"

achim~



More information about the samba mailing list