[Samba] samba getting stuck, highwatermark replication issue?

mj lists at merit.unu.edu
Sat Oct 28 09:22:58 UTC 2017


Hi all,

For the archives I'd like to update this thread with our latest 
findings, the fix for both the high cpu usage and the highwatermark errors!

We had been testing in the past the Microsoft Azure Connect, to import 
our active directory accounts/groups/passwords into the Azure Cloud, in 
order to test microsoft office 365 functionality.

The required tool "Microsoft Azure AD Sync" is what caused our problems! 
We disabled it, and poof, no more high cpu usage, no more highwatermark 
errors.

Hope this info helps someone else, someday :-)

MJ

On 10/09/2017 07:28 PM, mj via samba wrote:
> Hi all,
> 
> We would appreciate some input here. Not sure where to look...
> 
> We have three AD DCs, all running samba 4.5.10, and since a few days, 
> the samba DCs are getting stuck regularly, at ramdon times. Happens to 
> all three of them, randomly, and currently it is happening up to a few 
> times per day..! Must be some common cause.
> 
> For the rest, the systems appear fine, enough diskspace, nothing special 
> in syslog, etc.
> 
> We usually detect that a DC has become stuck, because LDAP auth no 
> longer works in that DC. Checking with "service sernet-samba-ad status" 
> will still report "Running".
> 
> After shutting down samba ("service sernet-samba-ad stop") one process 
> usually is still running, and prevents a restart from succeeding, always 
> because:
> 
>> Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
> 
> ps aux tells me that the process is: "samba -D"
> 
> Killing that process makes samba startup succeed, replication work 
> again, and samba funcion, until the next time this happens.
> 
> But WHY is samba getting stuck in the first place?
> 
> We are getting the following unusual in the logs on all three DCs:
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> and the last line keeps repeating 2 - 3 times per second, completely 
> filling up the logs. The start-off username  differs per DC, but on each 
> DC it usually remains the same. (I have seen 5 or 6 different usernames 
> in total)
> 
> samba-tool dbcheck --cross-ncs looks similar on all three DCs, with 
> *many* errors about unsorted attributes, that I think I've been told in 
> the past are harmless:
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
>> ERROR: unsorted attributeID values in replPropertyMetaData on 
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>>
>> Not fixing replPropertyMetaData on 
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>>
>> Please use --fix to fix these errors
>> Checked 4948 objects (4193 errors)
> 
> All 4948 errors are about unsorted attributeID, with the following 
> exception: There appear still some references to an old (many YEARS ago 
> removed) DC:
>> ERROR: no target object found for GUID component for 
>> msDS-NC-Replica-Locations in object 
>> CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com 
>> - 
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS 
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com 
>>
>> ERROR: no target object found for GUID component for 
>> msDS-NC-Replica-Locations in object 
>> CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com 
>> - 
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS 
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com 
>>
> 
> That's about all info I can gather.
> 
> The very basic smb.conf on the DCs::
> 
>> [global]
>>     workgroup = WRKGRP
>>     realm = samba.company.com
>>     netbios name = DC4
>>     server role = active directory domain controller
>>     log level = 3
>>     dns forwarder = 192.x.x.x
>>     server signing = mandatory
>>     ntlm auth = yes
>>     ldap server require strong auth = no
>>     idmap_ldb:use rfc2307 = yes
>>
>> [netlogon]
>>     path = /var/lib/samba/sysvol/samba.company.com/scripts
>>     read only = No
>>
>> [sysvol]
>>     path = /var/lib/samba/sysvol
>>     read only = No
>>     acl_xattr:ignore system acls = yes
> 
> We have been running 4.5.10 since may 2017, and this issue started this 
> week.
> 
> Anyone with an idea?
> 



More information about the samba mailing list