[Samba] samba getting stuck, highwatermark replication issue?

lingpanda101 lingpanda101 at gmail.com
Thu Oct 12 14:12:05 UTC 2017


On 10/12/2017 3:17 AM, mj wrote:
> Hi all, James,
>
> After following James' suggestions fixing the several dbcheck errors, 
> and having observed things for a few days, I'd like to update this 
> issue, and hope for some new input again. :-)
>
> Summary: three DCs, all three running Version 
> 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports 
> no errors, except for two (supposedly innocent) dangling forward links 
> that I'm ignoring for now. Time is synced. Very basic smb.conf, posted 
> earlier, can post again if needed.
>
> samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in 
> sync, and also samba-tool drs showrepl shows that replication seems to 
> be stable.
>
> The "getting stuck" from the subject line has not occured for a few 
> days, perhaps the dbcheck fixes have solved that, or perhaps we've 
> just been lucky.
>
> All in all this appears pretty healthy, but there is a remaing problem:
>
> At ANY given time, ONE RANDOM single DC shows high cpu usage on one 
> samba process. And on that DC (can be any of the three DCs) the logs 
> fill up with this:
>
>> [2017/10/12 08:38:57.956586,  3] 
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_accept_tls_loop: 
>> tstream_tls_accept_recv() - 104:Connection reset by peer'
>> [2017/10/12 08:38:57.956638,  3] 
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_accept_tls_loop: 
>> tstream_tls_accept_recv() - 104:Connection reset by peer]
>> [2017/10/12 08:38:57.956823,  3] 
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_accept_tls_loop: 
>> tstream_tls_accept_recv() - 104:Connection reset by peer'
>> [2017/10/12 08:38:57.956869,  3] 
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_accept_tls_loop: 
>> tstream_tls_accept_recv() - 104:Connection reset by peer]
>> [2017/10/12 08:38:57.956990,  3] 
>> ../source4/auth/ntlm/auth.c:271(auth_check_password_send)
>>   auth_check_password_send: Checking password for unmapped user 
>> []\[]@[(null)]
>>   auth_check_password_send: mapped user is: []\[]@[(null)]
>> [2017/10/12 08:38:57.958675,  3] 
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_call_loop: 
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
>> [2017/10/12 08:38:57.958728,  3] 
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_call_loop: 
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
>> [2017/10/12 08:38:57.958948,  3] 
>> ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>>   Terminating connection - 'ldapsrv_call_loop: 
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
>> [2017/10/12 08:38:57.958994,  3] 
>> ../source4/smbd/process_single.c:114(single_terminate)
>>   single_terminate: reason[ldapsrv_call_loop: 
>> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
>> [2017/10/12 08:38:57.969111,  0] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:57.969762,  2] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on 
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:58.378265,  0] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:58.379160,  2] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on 
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:58.810202,  0] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:58.810868,  2] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on 
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:59.251863,  0] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>> [2017/10/12 08:38:59.252418,  2] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on 
>> DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
>> [2017/10/12 08:38:59.692247,  0] 
>> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 
>> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark 
>> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
>
> I've seen "last_dn" be various things, system groups like above, but 
> also regular users, computers, and groups that we created. We have 
> even had (very few) cases were it was:
>
>> ./log.samba.3.gz: ../source4/rpc_server/drsuapi/getncchanges.c:1961: 
>> DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older 
>> highwatermark (last_dn DC=samba,DC=company,DC=com)
>
> Can anyone explain what is happening here, or help me understand this?
>
> I have read that highwatermark errors are not neccesarily bad, but the 
> fact that they cause continuous high cpu usage on a DC (80, 90%), 
> until the point where this behaviour "transfers" to a next DC makes me 
> feel that in this case, this is not normal, and indicates some kind of 
> problem.
>
> Thanks for input!
>
> MJ
>
> MJ

MJ,

     A dev or someone else may to assist but your replication isn't 
syncing correctly among each other.  Those dangling links should have 
purged by now if it's in reference to a DC removed several years ago.

Did you do a full replication from a known good DC to the other two? 
This doesn't always fix the issue but is a good start. You didn't by 
chance restore a DC recently from backup or had one offline and recently 
powered on?

The highwatermark value tells the source DC what objects the destination 
DC is requesting to update. The high CPU usage seems due to the DC doing 
a full partition replication. The fact you stated this issue can happen 
on all 3 makes it ever tougher to help. I would normally advise to just 
demote the affected DC and join again.


-- 
--
James




More information about the samba mailing list