[Samba] samba getting stuck, highwatermark replication issue?

mj lists at merit.unu.edu
Thu Oct 12 19:01:20 UTC 2017


Hi James, list

We really appreciate your input on this, thanks!

On 10/12/2017 04:12 PM, lingpanda101 via samba wrote:
> MJ,
> 
>      A dev or someone else may to assist but your replication isn't 
> syncing correctly among each other.  Those dangling links should have 
> purged by now if it's in reference to a DC removed several years ago.

This is rather worrying :-|

Specially since I have all kinds of scripts in place that continously 
check replication, hourly using "samba-tool drs showrepl" plus 
"samba-tool ldapcmp" every other hour.

So one can even have problems, when all built-in checks succeed. :-(

Currently DC2 has high cpu usage, and grepping the log.samba for 
"Succeeded" gives this kind of result:

>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 3 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com

All zero, with some exceptions...

I image this looks better, a sample from the non-high CPU DCs:
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
>   Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
>   Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com

Some zeros, but many indications that it is actually replicating data.

> Did you do a full replication from a known good DC to the other two? 
Well at this point I have no idea which DC I can consider "a good dc".

> This doesn't always fix the issue but is a good start. You didn't by 
> chance restore a DC recently from backup or had one offline and recently 
> powered on?
No. These three DCs have been online for many years, ever since the DC1 
was removed. (we never demoted it, since it had crashed, so we manually 
removed the DC1 from the database, that's perhaps why there are some 
remains)

The fact that there are still two 'dangling forward links', identical on 
all DCs, makes me think that we simply have missed those when we 
manually removed all DC1 references. This happened back in the samba 4.1 
days.

> The highwatermark value tells the source DC what objects the destination 
> DC is requesting to update. The high CPU usage seems due to the DC doing 
> a full partition replication. The fact you stated this issue can happen 
> on all 3 makes it ever tougher to help. I would normally advise to just 
> demote the affected DC and join again.

Perhaps I should try if I can find a combination of two DCs that works, 
check replication, verify with ldapcmp, make sure no high cpu, etc, etc, 
and then trust those two and demote the third.

Any input here would be very welcome... Here's bit of the logs, leading 
up to the "Replicated 0 objects" on the current high-cpu DC, hopefully 
that reveils something..?

>   Not authoritative for '_kerberos.com', forwarding
> [2017/10/12 06:00:16.744615,  2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send)
>   Not authoritative for '_kerberos.com', forwarding
> [2017/10/12 06:00:16.745393,  2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send)
>   Not authoritative for '_kerberos.com', forwarding
> [2017/10/12 06:00:16.745731,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: AS-REQ authtime: 2017-10-12T06:00:16 starttime: unset endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.745830,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: Client supported enctypes: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96, des3-cbc-sha1, des3-cbc-md5, arcfour-hmac-md5, using arcfour-hmac-md5/arcfour-hmac-md5
> [2017/10/12 06:00:16.745975,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: Requested flags: forwardable
> [2017/10/12 06:00:16.748679,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40725 for ldap/dc2.SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [canonicalize]
> [2017/10/12 06:00:16.754551,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.755962,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41634 for ldap/DC2.SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [canonicalize]
> [2017/10/12 06:00:16.762012,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.762249,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.762249,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.762320,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.762967,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40726 for krbtgt/SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [forwarded, forwardable]
> [2017/10/12 06:00:16.765363,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.765585,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.765679,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.766324,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41635 for krbtgt/SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [forwarded, forwardable]
> [2017/10/12 06:00:16.768612,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.768836,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.768907,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.769475,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.769542,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.799101,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41637 for ldap/dc2.SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [canonicalize]
> [2017/10/12 06:00:16.808786,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.809681,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.809767,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.817237,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41638 for krbtgt/SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [forwarded, forwardable]
> [2017/10/12 06:00:16.819573,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>   Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
> [2017/10/12 06:00:16.820289,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
> [2017/10/12 06:00:16.820368,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
> [2017/10/12 06:00:16.843259,  2] ../source4/dsdb/repl/replicated_objects.c:1016(dsdb_replicated_objects_commit)
>   Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com

Lot's of NT_STATUS_CONNECTION_DISCONNECTED. Ideas anyone..?

MJ



More information about the samba mailing list