[Samba] samba getting stuck, highwatermark replication issue?

mj lists at merit.unu.edu
Thu Oct 12 07:17:39 UTC 2017


Hi all, James,

After following James' suggestions fixing the several dbcheck errors, 
and having observed things for a few days, I'd like to update this 
issue, and hope for some new input again. :-)

Summary: three DCs, all three running Version 
4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports 
no errors, except for two (supposedly innocent) dangling forward links 
that I'm ignoring for now. Time is synced. Very basic smb.conf, posted 
earlier, can post again if needed.

samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in 
sync, and also samba-tool drs showrepl shows that replication seems to 
be stable.

The "getting stuck" from the subject line has not occured for a few 
days, perhaps the dbcheck fixes have solved that, or perhaps we've just 
been lucky.

All in all this appears pretty healthy, but there is a remaing problem:

At ANY given time, ONE RANDOM single DC shows high cpu usage on one 
samba process. And on that DC (can be any of the three DCs) the logs 
fill up with this:

> [2017/10/12 08:38:57.956586,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer'
> [2017/10/12 08:38:57.956638,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer]
> [2017/10/12 08:38:57.956823,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer'
> [2017/10/12 08:38:57.956869,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer]
> [2017/10/12 08:38:57.956990,  3] ../source4/auth/ntlm/auth.c:271(auth_check_password_send)
>   auth_check_password_send: Checking password for unmapped user []\[]@[(null)]
>   auth_check_password_send: mapped user is: []\[]@[(null)]
> [2017/10/12 08:38:57.958675,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
> [2017/10/12 08:38:57.958728,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
> [2017/10/12 08:38:57.958948,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
>   Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
> [2017/10/12 08:38:57.958994,  3] ../source4/smbd/process_single.c:114(single_terminate)
>   single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
> [2017/10/12 08:38:57.969111,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:57.969762,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:58.378265,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:58.379160,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:58.810202,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:58.810868,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:59.251863,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:59.252418,  2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:59.692247,  0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
>   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)

I've seen "last_dn" be various things, system groups like above, but 
also regular users, computers, and groups that we created. We have even 
had (very few) cases were it was:

> ./log.samba.3.gz:  ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn DC=samba,DC=company,DC=com)

Can anyone explain what is happening here, or help me understand this?

I have read that highwatermark errors are not neccesarily bad, but the 
fact that they cause continuous high cpu usage on a DC (80, 90%), until 
the point where this behaviour "transfers" to a next DC makes me feel 
that in this case, this is not normal, and indicates some kind of problem.

Thanks for input!

MJ

MJ



More information about the samba mailing list