[Samba] samba getting stuck, highwatermark replication issue?

lingpanda101 lingpanda101 at gmail.com
Mon Oct 9 18:52:46 UTC 2017


On 10/9/2017 1:28 PM, mj via samba wrote:
> Hi all,
>
> We would appreciate some input here. Not sure where to look...
>
> We have three AD DCs, all running samba 4.5.10, and since a few days, 
> the samba DCs are getting stuck regularly, at ramdon times. Happens to 
> all three of them, randomly, and currently it is happening up to a few 
> times per day..! Must be some common cause.
>
> For the rest, the systems appear fine, enough diskspace, nothing 
> special in syslog, etc.
>
> We usually detect that a DC has become stuck, because LDAP auth no 
> longer works in that DC. Checking with "service sernet-samba-ad 
> status" will still report "Running".
>
> After shutting down samba ("service sernet-samba-ad stop") one process 
> usually is still running, and prevents a restart from succeeding, 
> always because:
>
>> Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED
>
> ps aux tells me that the process is: "samba -D"
>
> Killing that process makes samba startup succeed, replication work 
> again, and samba funcion, until the next time this happens.
>
> But WHY is samba getting stuck in the first place?
>
> We are getting the following unusual in the logs on all three DCs:
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd 
>> replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> and the last line keeps repeating 2 - 3 times per second, completely 
> filling up the logs. The start-off username  differs per DC, but on 
> each DC it usually remains the same. (I have seen 5 or 6 different 
> usernames in total)
>
> samba-tool dbcheck --cross-ncs looks similar on all three DCs, with 
> *many* errors about unsorted attributes, that I think I've been told 
> in the past are harmless:
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
>> ERROR: unsorted attributeID values in replPropertyMetaData on 
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>>
>> Not fixing replPropertyMetaData on 
>> CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com
>>
>> Please use --fix to fix these errors
>> Checked 4948 objects (4193 errors)
>
> All 4948 errors are about unsorted attributeID, with the following 
> exception: There appear still some references to an old (many YEARS 
> ago removed) DC:
>> ERROR: no target object found for GUID component for 
>> msDS-NC-Replica-Locations in object 
>> CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com 
>> - 
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS 
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com 
>>
>> ERROR: no target object found for GUID component for 
>> msDS-NC-Replica-Locations in object 
>> CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com 
>> - 
>> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS 
>> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com 
>>
>
> That's about all info I can gather.
>
> The very basic smb.conf on the DCs::
>
>> [global]
>>     workgroup = WRKGRP
>>     realm = samba.company.com
>>     netbios name = DC4
>>     server role = active directory domain controller
>>     log level = 3
>>     dns forwarder = 192.x.x.x
>>     server signing = mandatory
>>     ntlm auth = yes
>>     ldap server require strong auth = no
>>     idmap_ldb:use rfc2307 = yes
>>
>> [netlogon]
>>     path = /var/lib/samba/sysvol/samba.company.com/scripts
>>     read only = No
>>
>> [sysvol]
>>     path = /var/lib/samba/sysvol
>>     read only = No
>>     acl_xattr:ignore system acls = yes
>
> We have been running 4.5.10 since may 2017, and this issue started 
> this week.
>
> Anyone with an idea?
>
You should be able to fix the 'replPropertyMetaData' errors with;

samba-tool dbcheck --cross-ncs --fix --yes 'fix_replmetadata_unsorted_attid'

The highwatermark doesn't necessarily reflect an issue. It's part of how 
the destination DC keeps track of changes from the source DC. Can you 
verify the time and date is correct on all DC's?

The GUID errors seem related to your old DC offline and NTDS connections 
still lingering.  Open Microsoft Sites and Services and remove the ones 
no longer needed.



-- 
--
James




More information about the samba mailing list