[Samba] How to diagnose a busy LDAP server process in the Samba AD DC

Andrew Bartlett abartlet at samba.org
Thu Apr 11 18:35:21 UTC 2024


On Thu, 2024-04-11 at 14:21 -0300, Elias Pereira wrote:
> Hello Andrew,
> 
> 1. What is the explanation for the fact that when the log level is
> set to 5 or 7, the NT_STATUS_IO_TIMEOUT error does not appear, but
> when it is at the default log level, it does?

I don't have an explanation for this, sorry.  Have you looked into the
1.5 second queries, what is sending them and why?

Have you done work in wireshark to see what clients are much more
chatty than others? (the statistics module should help).  

> Another point I've noticed before is that when I run the command
> "samba-tool dbcheck --cross-ncs --reset-well-known-acls --fix --yes"
> (Checked 15337 objects (0 errors)), and in another terminal analyze
> the log, some errors always occur:
> 
> source4/dsdb/kcc/kcc_periodic.c:790: Failed samba_kcc -
> NT_STATUS_IO_TIMEOUT
> and
> IRPC callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT

DBcheck will be holding a lock over the database preventing all other
operations, which will make some things timeout. 

> 2. Any discrepancies between the objects? Knowing that when running
> the command "samba-tool ldapcmp...", there are no differences between
> DCs.
> 
> On Tue, Apr 2, 2024 at 4:28 PM Andrew Bartlett <abartlet at samba.org>
> wrote:
> > 1.5 seconds is pretty long, I would look into what those queries
> > are.  
> > 
> > I would also look into repeated queries, sometimes these things are
> > clients stuck in a loop where they don't complete because they
> > expect some termination condition. 
> > 
> > Andrew Bartlett
> > 
> > On Tue, 2024-04-02 at 09:25 -0300, Elias Pereira via samba wrote:
> > > The saga continues...
> > > 
> > > I've spent a whole day with log level 5 and 7 and no error. All I have to
> > > do is return the log to the default and the error reappears.
> > > 
> > > I monitored the "LDAP Query: Duration", but I didn't notice any crashes in
> > > the queries.
> > > 
> > > I don't know if it's a long time, but some queries took 1.5s.
> > > 
> > > Is there anything else I can do?
> > > 
> > > On Mon, Mar 25, 2024 at 1:30 PM Elias Pereira <
> > > empbilly at gmail.com
> > > > wrote:
> > > 
> > > > Hello Andrew,
> > > > 
> > > > What's the explanation for when the log level is set to 5, the error
> > > > NT_STATUS_IO_TIMEOUT doesn't appear, but when it's at the default log
> > > > level, it does?
> > > > 
> > > > On Mon, Mar 18, 2024 at 10:33 AM Elias Pereira <
> > > > empbilly at gmail.com
> > > > > wrote:
> > > > 
> > > > > hi Andrew, thanks for the help!!!
> > > > > 
> > > > > It seems to me the LDAP process being busy would be the root cause here.
> > > > > > Working out what is going on here shouldn't is a detective task - I always
> > > > > > start with a wireshark trace.  The client making all the noise/traffic will
> > > > > > be the one causing the trouble.
> > > > > 
> > > > > 
> > > > > In the wireshark analysis, should I filter only by the ldap protocol or
> > > > > leave everything? Should I look at something specific in the client logs?
> > > > > 
> > > > > On Sun, Mar 10, 2024 at 9:31 PM Andrew Bartlett <
> > > > > abartlet at samba.org
> > > > > >
> > > > > wrote:
> > > > > 
> > > > > > Thanks for getting back to me.
> > > > > > 
> > > > > > It seems to me the LDAP process being busy would be the root cause
> > > > > > here.  Working out what is going on here shouldn't is a detective task - I
> > > > > > always start with a wireshark trace.  The client making all the
> > > > > > noise/traffic will be the one causing the trouble.
> > > > > > 
> > > > > > If it isn't clear from that, then look into the DB audit logging for
> > > > > > perhaps busy writes
> > > > > > 
> > > > > > 
> > > > > > https://wiki.samba.org/index.php/Setting_up_Audit_Logging#E
> > > > > > nabling_AD_DC_Database_Audit_Logging
> > > > > > 
> > > > > > 
> > > > > > Finally, set 'log level = 5' and look for logs like: LDAP Query:
> > > > > > Duration was
> > > > > > 
> > > > > > This will tell you about how long each query is taking, potentially
> > > > > > showing a particularly slow query that needs to be stopped.
> > > > > > 
> > > > > > Andrew Bartlett
> > > > > > 
> > > > > > On Sun, 2024-03-10 at 19:46 -0300, Elias Pereira wrote:
> > > > > > 
> > > > > > Is the drepl local processes very busy doing inbound replication?
> > > > > > 
> > > > > > 
> > > > > > How can I check this?
> > > > > > 
> > > > > > My instinct is either the server is very busy (and this should show up
> > > > > > in CPU use) or a transaction is being held open excessively.
> > > > > > 
> > > > > > 
> > > > > > I use VMs on Proxmox. In DC1, I installed the Proxmox agent, and CPU
> > > > > > usage via the dashboard is very low. However, when I checked using 'top,'
> > > > > > the LDAP process is consuming around 94/96% of the CPU. Very strange.
> > > > > > 
> > > > > > 
> > > > > > It is probably 94% of a single CPU, but you might have 8 CPUs in the VM,
> > > > > > so overall use is low.
> > > > > > 
> > > > > > The VM has 4 CPUs and 6GB of memory.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Sun, Mar 10, 2024 at 5:55 PM Andrew Bartlett <
> > > > > > abartlet at samba.org
> > > > > > >
> > > > > > wrote:
> > > > > > 
> > > > > > Either the local server is busy, or possibly (but it would not explain
> > > > > > the samba_kcc) Samba's drepl process is stuck talking to a remote server.
> > > > > > 
> > > > > > Is the drepl local processes very busy doing inbound replication?
> > > > > > 
> > > > > > My instinct is either the server is very busy (and this should show up
> > > > > > in CPU use) or a transaction is being held open excessively.
> > > > > > 
> > > > > > Andrew Bartlett
> > > > > > 
> > > > > > On Sat, 2024-03-09 at 19:11 -0300, Elias Pereira via samba wrote:
> > > > > > 
> > > > > > I've been grappling with a recurring set of errors for quite some time now:
> > > > > > 
> > > > > > - UpdateRefs failed with NT_STATUS_IO_TIMEOUT
> > > > > > 
> > > > > > - Failed samba_kcc - NT_STATUS_IO_TIMEOUT
> > > > > > 
> > > > > > - IRPC callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT
> > > > > > 
> > > > > > 
> > > > > > Despite cranking up the log level to 10, the returned information remains
> > > > > > 
> > > > > > frustratingly cryptic and hard to decipher.
> > > > > > 
> > > > > > 
> > > > > > This error, being overly generic, continues to elude identification even
> > > > > > 
> > > > > > with
> > > > > > 
> > > > > > the heightened log verbosity. The challenge lies in tracing its origin.
> > > > > > 
> > > > > > 
> > > > > > Running samba-tool dbcheck doesn't reveal any problems, yet executing the
> > > > > > 
> > > > > > command while monitoring the Samba log with "tail -f" exposes errors
> > > > > > 
> > > > > > identical
> > > > > > 
> > > > > > to those described above.
> > > > > > 
> > > > > > 
> > > > > > Interestingly, samba-tool drs showrepl doesn't report any errors.
> > > > > > 
> > > > > > 
> > > > > > So, what additional steps can be taken to unearth the root cause
> > > > > > 
> > > > > > of these persistent NT_STATUS_IO_TIMEOUT errors?
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Fri, Mar 1, 2024 at 10:32 PM Elias Pereira <
> > > > > > 
> > > > > > empbilly at gmail.com
> > > > > > 
> > > > > > 
> > > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > There is probably nothing wrong with your log, but Firefox doesn't
> > > > > > 
> > > > > > like it, it thinks it contains a virus.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > I just saw now that your response ended up in spam, probably because of
> > > > > > 
> > > > > > the link with the log. O.o
> > > > > > 
> > > > > > 
> > > > > > I still receive the error in the logs:
> > > > > > 
> > > > > > source4/dsdb/kcc/kcc_periodic.c:790: Failed samba_kcc -
> > > > > > 
> > > > > > NT_STATUS_IO_TIMEOUT
> > > > > > 
> > > > > > 
> > > > > > The strangest thing is that it occurs when the command is executed:
> > > > > > 
> > > > > > samba-tool dbcheck --cross-ncs --fix --yes
> > > > > > 
> > > > > > 
> > > > > > Could it be some object causing this error?
> > > > > > 
> > > > > > 
> > > > > > On Mon, Feb 12, 2024 at 4:40 PM Rowland Penny via samba <
> > > > > > 
> > > > > > samba at lists.samba.org
> > > > > > 
> > > > > > 
> > > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, 12 Feb 2024 16:20:27 -0300
> > > > > > 
> > > > > > Elias Pereira via samba <
> > > > > > 
> > > > > > samba at lists.samba.org
> > > > > > 
> > > > > > 
> > > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > hi,
> > > > > > 
> > > > > > 
> > > > > > My saga continues...
> > > > > > 
> > > > > > 
> > > > > > I've configured the audit log for drs_repl in smb.conf, and below is
> > > > > > 
> > > > > > the log generated.
> > > > > > 
> > > > > > https://transfer.sh/7fen4qCNIQ/drs_repl.log
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > The log level was 5.
> > > > > > 
> > > > > > drs_repl:5@/var/log/samba/drs_repl.log
> > > > > > 
> > > > > > 
> > > > > > Could someone take a look and help me understand the log?
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > There is probably nothing wrong with your log, but Firefox doesn't
> > > > > > 
> > > > > > like it, it thinks it contains a virus.
> > > > > > 
> > > > > > 
> > > > > > Rowland
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > To unsubscribe from this list go to the following URL and read the
> > > > > > 
> > > > > > instructions:
> > > > > > 
> > > > > > https://lists.samba.org/mailman/options/samba
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > Elias Pereira
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > Elias Pereira
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > 
> > > > > > Andrew Bartlett (he/him)       
> > > > > > https://samba.org/~abartlet/
> > > > > > 
> > > > > > Samba Team Member (since 2001) 
> > > > > > https://samba.org
> > > > > > 
> > > > > > Samba Team Lead                
> > > > > > https://catalyst.net.nz/services/samba
> > > > > > 
> > > > > > Catalyst.Net Ltd
> > > > > > 
> > > > > > Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group
> > > > > > company
> > > > > > 
> > > > > > Samba Development and Support: 
> > > > > > https://catalyst.net.nz/services/samba
> > > > > > 
> > > > > > 
> > > > > > Catalyst IT - Expert Open Source Solutions
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Elias Pereira
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > Andrew Bartlett (he/him)       
> > > > > > https://samba.org/~abartlet/
> > > > > > 
> > > > > > Samba Team Member (since 2001) 
> > > > > > https://samba.org
> > > > > > 
> > > > > > Samba Team Lead                
> > > > > > https://catalyst.net.nz/services/samba
> > > > > > 
> > > > > > Catalyst.Net Ltd
> > > > > > 
> > > > > > Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group
> > > > > > company
> > > > > > 
> > > > > > Samba Development and Support: 
> > > > > > https://catalyst.net.nz/services/samba
> > > > > > 
> > > > > > 
> > > > > > Catalyst IT - Expert Open Source Solutions
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > --
> > > > > Elias Pereira
> > > > > 
> > > > 
> > > > 
> > > > --
> > > > Elias Pereira
> > > > 
> > > 
> > > 
> > > -- 
> > > Elias Pereira
> > 
> > -- 
> > Andrew Bartlett (he/him)       https://samba.org/~abartlet/
> > Samba Team Member (since 2001) https://samba.org
> > Samba Team
> > Lead                https://catalyst.net.nz/services/samba
> > Catalyst.Net Ltd
> > 
> > Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group
> > company
> > 
> > Samba Development and Support:
> > https://catalyst.net.nz/services/samba
> > 
> > Catalyst IT - Expert Open Source Solutions
> 
> 
> -- 
> Elias Pereira

-- 
Andrew Bartlett (he/him) https://samba.org/~abartlet/
Samba Team Member (since 2001) https://samba.org
Samba Developer, Catalyst IT https://catalyst.net.nz/services/samba



More information about the samba mailing list