[Samba] Homes shares randomly dissapear on AD-DC'S

Achim Gottinger achim at ag-web.biz
Wed Jul 23 09:42:18 MDT 2014


Am 23.07.2014 11:24, schrieb Achim Gottinger:
> Am 23.07.2014 11:22, schrieb Achim Gottinger:
>> Am 23.07.2014 10:46, schrieb Achim Gottinger:
>>> Am 15.07.2014 09:18, schrieb Achim Gottinger:
>>>> Am 10.07.2014 12:13, schrieb Achim Gottinger:
>>>>> Am 09.07.2014 12:58, schrieb Achim Gottinger:
>>>>>> Am 09.07.2014 11:29, schrieb Achim Gottinger:
>>>>>>> Am 09.07.2014 11:08, schrieb Jonathan Buzzard:
>>>>>>>> On Wed, 2014-07-09 at 10:42 +0200, Achim Gottinger wrote:
>>>>>>>>
>>>>>>>> [SNIP]
>>>>>>>>
>>>>>>>>>   I use unscd for caching, restarted it but it did not help.
>>>>>>>> I take it that you missed the big warnings not to use nscd in
>>>>>>>> combination with winbind? You are aware that winbind does it's own
>>>>>>>> caching?
>>>>>>>>
>>>>>>>> I would suggest your first port of call is to disable unscd and 
>>>>>>>> see if
>>>>>>>> the problem goes away.
>>>>>>>>
>>>>>>>> JAB.
>>>>>>>>
>>>>>>> Thank you for the tip, disabled it at all four locations. I used 
>>>>>>> unscd also on the main site which always ran rock solid.
>>>>>>>
>>>>>>> Restarting samba on the branches witch winbind/nss issues fixed 
>>>>>>> wbinfo/getent passwd tests for a few minutes but now they do not 
>>>>>>> resolve again. Gotta watch it with unscd disabled now.
>>>>>>> Thinking about downgrading tp 4.1.4 which had had the issues but 
>>>>>>> they appeared only once a week and not every few hours.
>>>>>>>
>>>>>>> achim~
>>>>>>>
>>>>>> Had to restart samba a few more times meanwhile. Was able to make 
>>>>>> it fail running wbinfo -u a few times. Since they servers are all 
>>>>>> vm's with 1GB in the branches i increased the moemory to 3Gb and 
>>>>>> since then i was not able to make samba fail with wbinfo -u. Hope 
>>>>>> that did the trick.
>>>>>>
>>>>> So far no more [homes] drop outs with 3GB memory assigned. Also 
>>>>> wbinfo -u getent passwd work flawless. Skimming thru saved log 
>>>>> files from yesterday trying to find anything memory related but i 
>>>>> can not find anything. Also there are no sings like OOM kills in 
>>>>> syslog at that timeframe.
>>>>> The vm's had 4GB swap space assigned which had shown usage in few 
>>>>> MB range.
>>>>> Would have expected slow down's in speed due to swapping but no 
>>>>> silent dropping of shares if an server runs out of memory.
>>>>>
>>>>> achim
>>>> After it worked on Fr, Sa and Monday, this morning they dissapeared 
>>>> at our main site for the first time. This vm has 6GB memory and 4 
>>>> cpu cores assigned and it is the first time the [homes] share 
>>>> stopped working. Even after restarting samba wbinfo -u und wbinfo 
>>>> -g takes sometimes up to 30 seconds to enumerate users/groups.
>>>>
>>>> achim~
>>>>
>>> So far the issue reappeared on our main site last friday at around 
>>> 9am and again multiple times today since 9:15am. It did not appear 
>>> on the branches since i increased memory to 3gb.
>>> People start calling that their home directories are not accessible 
>>> any longer. Not all accounts seem to be affected and others can 
>>> continue to work for an while.
>>>
>>> wbinfo -u reports "Error looking up domain users".
>>>
>>> Reloading samba services does not help i have to restart them. It's 
>>> difficult to track down the issue the server is in production and 
>>> must get back into an working state asap.
>>>
>>> Also i noticed wbinfo -u sometimes takes an long time to report 
>>> results. This is an snippet of an strace, showing an few timeouts 
>>> trying to access /var/run/samba/winbindd/pipe.
>>>
>>> Any suggestions how i can track the issues down are welcome.
>>>
>>> Thanks in advance,
>>> achim~
>>>
>>> connect(3, {sa_family=AF_FILE, path="/var/run/samba/winbindd/pipe"}, 
>>> 110) = 0
>>>
>>> poll([{fd=3, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=3, 
>>> revents=POLLOUT}])
>>>
>>> write(3, 
>>> "0\10\0\0\0\0\0\0\0\0\0\0\306|\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 2096) = 2096
>>>
>>> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
>>> revents=POLLIN}])
>>>
>>> read(3, 
>>> "\250\r\0\0\2\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3496) = 3496
>>>
>>> poll([{fd=3, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=3, 
>>> revents=POLLOUT}])
>>>
>>> write(3, 
>>> "0\10\0\0/\0\0\0\0\0\0\0\306|\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 2096) = 2096
>>>
>>> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
>>> revents=POLLIN}])
>>>
>>> read(3, 
>>> "\313\r\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3496) = 3496
>>>
>>> poll([{fd=3, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=3, 
>>> revents=POLLIN}])
>>>
>>> read(3, "/var/lib/samba/winbindd_privileg"..., 35) = 35
>>>
>>> lstat("/var/lib/samba/winbindd_privileged", {st_mode=S_IFDIR|0750, 
>>> st_size=4096, ...}) = 0
>>>
>>> lstat("/var/lib/samba/winbindd_privileged/pipe", 
>>> {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>>
>>> socket(PF_FILE, SOCK_STREAM, 0)         = 4
>>>
>>> fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
>>>
>>> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>>>
>>> fcntl(4, F_GETFD)                       = 0
>>>
>>> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
>>>
>>> connect(4, {sa_family=AF_FILE, 
>>> path="/var/lib/samba/winbindd_privileged/pipe"}, 110) = 0
>>>
>>> close(3)                                = 0
>>>
>>> poll([{fd=4, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=4, 
>>> revents=POLLOUT}])
>>>
>>> write(4, 
>>> "0\10\0\0\22\0\0\0\0\0\0\0\306|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 2096) = 2096
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 0 (Timeout)
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=4, 
>>> revents=POLLIN}])
>>>
>>> read(4, 
>>> "\24\20\0\0\2\0\0\0\236\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3496) = 3496
>>>
>>> poll([{fd=4, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=4, 
>>> revents=POLLIN}])
>>>
>>>
>> May I ask list members to do an quick test and post the results of
>> time "wbinfo -u"
> It's
> time wbinfo -u
> without the quotes
>>
>> On this network it often takes up to 30 seconds an short time 
>> (<5secs) later 1-2 seconds but soon afterwards it are 30 seconds 
>> again. This domain has around 200 user accounds and around 50 clients.
>>
>> On another network with 50 users and 30 clients there is no delay 
>> calling "wbinfo -u"
>>
>> achim~
>>
>
It is really odd, now in late afternoon "time wbinfo -u" usually takes 
1.1-1.2s without having an long delay for the first run. On an  ~1/10 
there are still spices up to10-20s.
Number of users is in the same range where it has been in the morning.

Inspecting level3 log of the morning for the user whom called me. First 
appearence of DOMAIN\username is

[2014/07/23 09:12:09.658813,  3] 
../source3/smbd/password.c:138(register_homes_share)
   No home directory defined for user 'DOMAIN\username'

Now looking for that error message it appeared first here.

[2014/07/23 09:08:48.927639,  3] 
../source3/smbd/password.c:138(register_homes_share)
   No home directory defined for user 'DOMAIN\ACRIBA$'
[2014/07/23 09:08:48.929571,  3] 
../source3/smbd/password.c:138(register_homes_share)
   No home directory defined for user 'DOMAIN\WIN7-Z-EMPFANG2$'
[2014/07/23 09:08:48.930933,  3] 
../source3/smbd/password.c:138(register_homes_share)
   No home directory defined for user 'DOMAIN\TERMINALSERVER$'
[2014/07/23 09:08:48.931084,  3] ../source3/smbd/process.c:1802(process_smb)
   Transaction 2 of length 88 (0 toread)
[2014/07/23 09:08:48.931374,  3] 
../source3/smbd/process.c:1405(switch_message)
   switch message SMBtconX (pid 22456) conn 0x0
[2014/07/23 09:08:48.932020,  2] ../source3/smbd/process.c:2672(deadtime_fn)
   Closing idle connection
[2014/07/23 09:08:48.932510,  3] ../source3/lib/access.c:338(allow_access)
   Allowed connection from 192.168.1.104 (192.168.1.104)
[2014/07/23 09:08:48.932706,  3] 
../source3/smbd/server.c:159(msg_exit_server)
   got a SHUTDOWN message

But the "No home directory defined for user 'DOMAIN\username'" error 
message also appeard in the afternoon while everything was working.

Running "time wbinfo -u" on the other network with half the user base 
and an much faster harddisk backend the command takes ~0.075-0.1s.

.....


More information about the samba mailing list