[Samba] samba getting stuck, highwatermark replication issue?

mj lists at merit.unu.edu
Mon Oct 9 17:28:24 UTC 2017


Hi all,

We would appreciate some input here. Not sure where to look...

We have three AD DCs, all running samba 4.5.10, and since a few days, 
the samba DCs are getting stuck regularly, at ramdon times. Happens to 
all three of them, randomly, and currently it is happening up to a few 
times per day..! Must be some common cause.

For the rest, the systems appear fine, enough diskspace, nothing special 
in syslog, etc.

We usually detect that a DC has become stuck, because LDAP auth no 
longer works in that DC. Checking with "service sernet-samba-ad status" 
will still report "Running".

After shutting down samba ("service sernet-samba-ad stop") one process 
usually is still running, and prevents a restart from succeeding, always 
because:

> Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED

ps aux tells me that the process is: "samba -D"

Killing that process makes samba startup succeed, replication work 
again, and samba funcion, until the next time this happens.

But WHY is samba getting stuck in the first place?

We are getting the following unusual in the logs on all three DCs:
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
and the last line keeps repeating 2 - 3 times per second, completely 
filling up the logs. The start-off username  differs per DC, but on each 
DC it usually remains the same. (I have seen 5 or 6 different usernames 
in total)

samba-tool dbcheck --cross-ncs looks similar on all three DCs, with 
*many* errors about unsorted attributes, that I think I've been told in 
the past are harmless:
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
> ERROR: unsorted attributeID values in replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
> 
> Not fixing replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
> 
> Please use --fix to fix these errors
> Checked 4948 objects (4193 errors)

All 4948 errors are about unsorted attributeID, with the following 
exception: There appear still some references to an old (many YEARS ago 
removed) DC:
> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com
> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com

That's about all info I can gather.

The very basic smb.conf on the DCs::

> [global]
> 	workgroup = WRKGRP
> 	realm = samba.company.com
> 	netbios name = DC4
> 	server role = active directory domain controller
> 	log level = 3
> 	dns forwarder = 192.x.x.x
> 	server signing = mandatory
> 	ntlm auth = yes
> 	ldap server require strong auth = no
> 	idmap_ldb:use rfc2307 = yes
> 
> [netlogon]
> 	path = /var/lib/samba/sysvol/samba.company.com/scripts
> 	read only = No
> 
> [sysvol]
> 	path = /var/lib/samba/sysvol
> 	read only = No
> 	acl_xattr:ignore system acls = yes

We have been running 4.5.10 since may 2017, and this issue started this 
week.

Anyone with an idea?



More information about the samba mailing list