[Samba] samba getting stuck, highwatermark replication issue?
mj
lists at merit.unu.edu
Thu Oct 12 07:17:39 UTC 2017
Hi all, James,
After following James' suggestions fixing the several dbcheck errors,
and having observed things for a few days, I'd like to update this
issue, and hope for some new input again. :-)
Summary: three DCs, all three running Version
4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports
no errors, except for two (supposedly innocent) dangling forward links
that I'm ignoring for now. Time is synced. Very basic smb.conf, posted
earlier, can post again if needed.
samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in
sync, and also samba-tool drs showrepl shows that replication seems to
be stable.
The "getting stuck" from the subject line has not occured for a few
days, perhaps the dbcheck fixes have solved that, or perhaps we've just
been lucky.
All in all this appears pretty healthy, but there is a remaing problem:
At ANY given time, ONE RANDOM single DC shows high cpu usage on one
samba process. And on that DC (can be any of the three DCs) the logs
fill up with this:
> [2017/10/12 08:38:57.956586, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
> Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer'
> [2017/10/12 08:38:57.956638, 3] ../source4/smbd/process_single.c:114(single_terminate)
> single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer]
> [2017/10/12 08:38:57.956823, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
> Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer'
> [2017/10/12 08:38:57.956869, 3] ../source4/smbd/process_single.c:114(single_terminate)
> single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer]
> [2017/10/12 08:38:57.956990, 3] ../source4/auth/ntlm/auth.c:271(auth_check_password_send)
> auth_check_password_send: Checking password for unmapped user []\[]@[(null)]
> auth_check_password_send: mapped user is: []\[]@[(null)]
> [2017/10/12 08:38:57.958675, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
> Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
> [2017/10/12 08:38:57.958728, 3] ../source4/smbd/process_single.c:114(single_terminate)
> single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
> [2017/10/12 08:38:57.958948, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
> Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET'
> [2017/10/12 08:38:57.958994, 3] ../source4/smbd/process_single.c:114(single_terminate)
> single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET]
> [2017/10/12 08:38:57.969111, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:57.969762, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:58.378265, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:58.379160, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:58.810202, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:58.810868, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:59.251863, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
> [2017/10/12 08:38:59.252418, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects)
> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1)
> [2017/10/12 08:38:59.692247, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges)
> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
I've seen "last_dn" be various things, system groups like above, but
also regular users, computers, and groups that we created. We have even
had (very few) cases were it was:
> ./log.samba.3.gz: ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn DC=samba,DC=company,DC=com)
Can anyone explain what is happening here, or help me understand this?
I have read that highwatermark errors are not neccesarily bad, but the
fact that they cause continuous high cpu usage on a DC (80, 90%), until
the point where this behaviour "transfers" to a next DC makes me feel
that in this case, this is not normal, and indicates some kind of problem.
Thanks for input!
MJ
MJ
More information about the samba
mailing list