[Samba] How to diagnose a busy LDAP server process in the Samba AD DC

Elias Pereira empbilly at gmail.com
Fri Apr 12 00:41:38 UTC 2024


>
> Have you looked into the 1.5 second queries, what is sending them and why?


We have a GCDS <https://support.google.com/a/answer/106368?hl=pt> to
synchronize AD users with our google workspace account.

The log below is possibly from when GCDS is running. In OU ALUNOS we have
2521 users.

root at dc1:/var/log/samba# cat tempos_ldap.txt
1.11s, SearchRequest by S-1-5-21-2137976744-3574706186-1594704298-16424
from ipv4:xxx.xxx.xxx.138:57558 filter:
[(&(objectCategory=person)(objectClass=user)(mail=*)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))]
basedn:
[ou=ALUNOS,ou=USUARIOS,ou=CAMPUS,dc=campus,dc=sertao,dc=ifrs,dc=edu,dc=br]
scope: [SUB] result: Success
0.61s, SearchRequest by S-1-5-21-2137976744-3574706186-1594704298-16424
from ipv4:xxx.xxx.xxx.138:41912 filter:
[(&(objectCategory=person)(objectClass=user)(mail=*)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))]
basedn:
[ou=ALUNOS,ou=USUARIOS,ou=CAMPUS,dc=campus,dc=sertao,dc=ifrs,dc=edu,dc=br]
scope: [SUB] result: Success
0.60s, SearchRequest by S-1-5-21-2137976744-3574706186-1594704298-16424
from ipv4:xxx.xxx.xxx.138:50272 filter:
[(&(objectCategory=person)(objectClass=user)(mail=*)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))]
basedn:
[ou=ALUNOS,ou=USUARIOS,ou=CAMPUS,dc=campus,dc=sertao,dc=ifrs,dc=edu,dc=br]
scope: [SUB] result: Success
0.60s, SearchRequest by S-1-5-21-2137976744-3574706186-1594704298-16424
from ipv4:xxx.xxx.xxx.138:42496 filter:
[(&(objectCategory=person)(objectClass=user)(mail=*)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))]
basedn:
[ou=ALUNOS,ou=USUARIOS,ou=CAMPUS,dc=campus,dc=sertao,dc=ifrs,dc=edu,dc=br]
scope: [SUB] result: Success

I'll do some tests by leaving GCDS disabled for a while. I'll also review
the settings.

On Thu, Apr 11, 2024 at 3:35 PM Andrew Bartlett <abartlet at samba.org> wrote:

> On Thu, 2024-04-11 at 14:21 -0300, Elias Pereira wrote:
>
> Hello Andrew,
>
> 1. What is the explanation for the fact that when the log level is set to
> 5 or 7, the NT_STATUS_IO_TIMEOUT error does not appear, but when it is at
> the default log level, it does?
>
>
> I don't have an explanation for this, sorry.  Have you looked into the 1.5
> second queries, what is sending them and why?
>
> Have you done work in wireshark to see what clients are much more chatty
> than others? (the statistics module should help).
>
> Another point I've noticed before is that when I run the command
> "samba-tool dbcheck --cross-ncs --reset-well-known-acls --fix --yes" (*Checked
> 15337 objects (0 errors)*), and in another terminal analyze the log, some
> errors always occur:
>
> *source4/dsdb/kcc/kcc_periodic.c:790: Failed samba_kcc -
> NT_STATUS_IO_TIMEOUT*
> and
> *IRPC callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT*
>
>
> DBcheck will be holding a lock over the database preventing all other
> operations, which will make some things timeout.
>
> 2. Any discrepancies between the objects? Knowing that when running the
> command "samba-tool ldapcmp...", there are no differences between DCs.
>
> On Tue, Apr 2, 2024 at 4:28 PM Andrew Bartlett <abartlet at samba.org> wrote:
>
> 1.5 seconds is pretty long, I would look into what those queries are.
>
> I would also look into repeated queries, sometimes these things are
> clients stuck in a loop where they don't complete because they expect some
> termination condition.
>
> Andrew Bartlett
>
> On Tue, 2024-04-02 at 09:25 -0300, Elias Pereira via samba wrote:
>
> The saga continues...
>
>
> I've spent a whole day with log level 5 and 7 and no error. All I have to
>
> do is return the log to the default and the error reappears.
>
>
> I monitored the "LDAP Query: Duration", but I didn't notice any crashes in
>
> the queries.
>
>
> I don't know if it's a long time, but some queries took 1.5s.
>
>
> Is there anything else I can do?
>
>
> On Mon, Mar 25, 2024 at 1:30 PM Elias Pereira <
>
> empbilly at gmail.com
>
> > wrote:
>
>
> Hello Andrew,
>
>
> What's the explanation for when the log level is set to 5, the error
>
> NT_STATUS_IO_TIMEOUT doesn't appear, but when it's at the default log
>
> level, it does?
>
>
> On Mon, Mar 18, 2024 at 10:33 AM Elias Pereira <
>
> empbilly at gmail.com
>
> > wrote:
>
>
> hi Andrew, thanks for the help!!!
>
>
> It seems to me the LDAP process being busy would be the root cause here.
>
> Working out what is going on here shouldn't is a detective task - I always
>
> start with a wireshark trace.  The client making all the noise/traffic will
>
> be the one causing the trouble.
>
>
>
> In the wireshark analysis, should I filter only by the ldap protocol or
>
> leave everything? Should I look at something specific in the client logs?
>
>
> On Sun, Mar 10, 2024 at 9:31 PM Andrew Bartlett <
>
> abartlet at samba.org
>
> >
>
> wrote:
>
>
> Thanks for getting back to me.
>
>
> It seems to me the LDAP process being busy would be the root cause
>
> here.  Working out what is going on here shouldn't is a detective task - I
>
> always start with a wireshark trace.  The client making all the
>
> noise/traffic will be the one causing the trouble.
>
>
> If it isn't clear from that, then look into the DB audit logging for
>
> perhaps busy writes
>
>
>
> https://wiki.samba.org/index.php/Setting_up_Audit_Logging#Enabling_AD_DC_Database_Audit_Logging
>
>
>
> Finally, set 'log level = 5' and look for logs like: LDAP Query:
>
> Duration was
>
>
> This will tell you about how long each query is taking, potentially
>
> showing a particularly slow query that needs to be stopped.
>
>
> Andrew Bartlett
>
>
> On Sun, 2024-03-10 at 19:46 -0300, Elias Pereira wrote:
>
>
> Is the drepl local processes very busy doing inbound replication?
>
>
>
> How can I check this?
>
>
> My instinct is either the server is very busy (and this should show up
>
> in CPU use) or a transaction is being held open excessively.
>
>
>
> I use VMs on Proxmox. In DC1, I installed the Proxmox agent, and CPU
>
> usage via the dashboard is very low. However, when I checked using 'top,'
>
> the LDAP process is consuming around 94/96% of the CPU. Very strange.
>
>
>
> It is probably 94% of a single CPU, but you might have 8 CPUs in the VM,
>
> so overall use is low.
>
>
> The VM has 4 CPUs and 6GB of memory.
>
>
>
>
> On Sun, Mar 10, 2024 at 5:55 PM Andrew Bartlett <
>
> abartlet at samba.org
>
> >
>
> wrote:
>
>
> Either the local server is busy, or possibly (but it would not explain
>
> the samba_kcc) Samba's drepl process is stuck talking to a remote server.
>
>
> Is the drepl local processes very busy doing inbound replication?
>
>
> My instinct is either the server is very busy (and this should show up
>
> in CPU use) or a transaction is being held open excessively.
>
>
> Andrew Bartlett
>
>
> On Sat, 2024-03-09 at 19:11 -0300, Elias Pereira via samba wrote:
>
>
> I've been grappling with a recurring set of errors for quite some time now:
>
>
> - UpdateRefs failed with NT_STATUS_IO_TIMEOUT
>
>
> - Failed samba_kcc - NT_STATUS_IO_TIMEOUT
>
>
> - IRPC callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT
>
>
>
> Despite cranking up the log level to 10, the returned information remains
>
>
> frustratingly cryptic and hard to decipher.
>
>
>
> This error, being overly generic, continues to elude identification even
>
>
> with
>
>
> the heightened log verbosity. The challenge lies in tracing its origin.
>
>
>
> Running samba-tool dbcheck doesn't reveal any problems, yet executing the
>
>
> command while monitoring the Samba log with "tail -f" exposes errors
>
>
> identical
>
>
> to those described above.
>
>
>
> Interestingly, samba-tool drs showrepl doesn't report any errors.
>
>
>
> So, what additional steps can be taken to unearth the root cause
>
>
> of these persistent NT_STATUS_IO_TIMEOUT errors?
>
>
>
>
> On Fri, Mar 1, 2024 at 10:32 PM Elias Pereira <
>
>
> empbilly at gmail.com
>
>
>
> wrote:
>
>
>
> There is probably nothing wrong with your log, but Firefox doesn't
>
>
> like it, it thinks it contains a virus.
>
>
>
>
> I just saw now that your response ended up in spam, probably because of
>
>
> the link with the log. O.o
>
>
>
> I still receive the error in the logs:
>
>
> source4/dsdb/kcc/kcc_periodic.c:790: Failed samba_kcc -
>
>
> NT_STATUS_IO_TIMEOUT
>
>
>
> The strangest thing is that it occurs when the command is executed:
>
>
> samba-tool dbcheck --cross-ncs --fix --yes
>
>
>
> Could it be some object causing this error?
>
>
>
> On Mon, Feb 12, 2024 at 4:40 PM Rowland Penny via samba <
>
>
> samba at lists.samba.org
>
>
>
> wrote:
>
>
>
> On Mon, 12 Feb 2024 16:20:27 -0300
>
>
> Elias Pereira via samba <
>
>
> samba at lists.samba.org
>
>
>
> wrote:
>
>
>
> hi,
>
>
>
> My saga continues...
>
>
>
> I've configured the audit log for drs_repl in smb.conf, and below is
>
>
> the log generated.
>
>
> https://transfer.sh/7fen4qCNIQ/drs_repl.log
>
>
>
>
>
> The log level was 5.
>
>
> drs_repl:5@/var/log/samba/drs_repl.log
>
>
>
> Could someone take a look and help me understand the log?
>
>
>
>
> There is probably nothing wrong with your log, but Firefox doesn't
>
>
> like it, it thinks it contains a virus.
>
>
>
> Rowland
>
>
>
>
>
> --
>
>
> To unsubscribe from this list go to the following URL and read the
>
>
> instructions:
>
>
> https://lists.samba.org/mailman/options/samba
>
>
>
>
>
>
>
> --
>
>
> Elias Pereira
>
>
>
>
>
> --
>
>
> Elias Pereira
>
>
> --
>
>
>
> Andrew Bartlett (he/him)
>
> https://samba.org/~abartlet/
>
>
> Samba Team Member (since 2001)
>
> https://samba.org
>
>
> Samba Team Lead
>
> https://catalyst.net.nz/services/samba
>
>
> Catalyst.Net Ltd
>
>
> Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group
>
> company
>
>
> Samba Development and Support:
>
> https://catalyst.net.nz/services/samba
>
>
>
> Catalyst IT - Expert Open Source Solutions
>
>
>
>
>
> --
>
> Elias Pereira
>
>
> --
>
>
> Andrew Bartlett (he/him)
>
> https://samba.org/~abartlet/
>
>
> Samba Team Member (since 2001)
>
> https://samba.org
>
>
> Samba Team Lead
>
> https://catalyst.net.nz/services/samba
>
>
> Catalyst.Net Ltd
>
>
> Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group
>
> company
>
>
> Samba Development and Support:
>
> https://catalyst.net.nz/services/samba
>
>
>
> Catalyst IT - Expert Open Source Solutions
>
>
>
>
>
> --
>
> Elias Pereira
>
>
>
>
> --
>
> Elias Pereira
>
>
>
>
> --
>
> Elias Pereira
>
> --
>
> Andrew Bartlett (he/him)       https://samba.org/~abartlet/
> Samba Team Member (since 2001) https://samba.org
> Samba Team Lead                https://catalyst.net.nz/services/samba
> Catalyst.Net Ltd
>
> Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group company
>
> Samba Development and Support: https://catalyst.net.nz/services/samba
>
> Catalyst IT - Expert Open Source Solutions
>
>
>
> --
> Elias Pereira
>
>
> --
>
> Andrew Bartlett (he/him)        https://samba.org/~abartlet/
> Samba Team Member (since 2001)  https://samba.org
> Samba Developer, Catalyst IT    https://catalyst.net.nz/services/samba
>
>

-- 
Elias Pereira


More information about the samba mailing list