CTDB Logging: File or syslog?

Wed Jan 28 18:35:13 MST 2015

Hi José,

On Wed, 28 Jan 2015 12:26:29 -0600, José A. Rivera <jarrpa at samba.org>
wrote:

> I've heard that for production scenarios it is sensible to use syslog for
> CTDB logging because CTDB doesn't handle log rotation very well (it doesn't
> use SIGHUP for that?). However I was just pointed to the following bit of
> doc in master:
> 
>                  Under heavy loads syslog(3) can block if the syslog
>                  daemon processes messages too slowly.  This can
>                  cause CTDB to block when logging.
> 
>                  If METHOD is specified then it specifies an
>                  extension that causes logging to be done in a
>                  non-blocking mode.  Note that <emphasis>this may
>                  cause messages to be dropped</emphasis>.
> 
> Are there distinct use cases where one logging mechanism is prefferred over
> the other? Is there a general recommendation for high-activity use cases?

Yeah, syslog is sensible...

CTDB doesn't log much during regular use (at, say, NOTICE level) so this
shouldn't usually be an issue.  There are a couple of cases where lots
of log messages can be produced in a short amount of time:

* DEBUG level logging

  This is incredibly busy.  If you enable DEBUG level logging on a busy
  cluster then you might hit the above situation.  However, if you're
  not a CTDB developer then DEBUG level logging probably won't generate
  anything meaningful.  I have used it very rarely.

* Killing connections during failover

  This generates a message for every connection that is killed, so can
  cause a flood.

rsyslogd defaults to throttling messages (on some platforms?) so that
should mitigate things somewhat.  However, the throttling level seems a
bit low and we sometimes miss useful messages.

The history is that a few years ago, when trying to debug an issue on
a production cluster with DEBUG level logging enabled, so much debug
was produced that syslogd (rsyslogd?) couldn't keep up, so syslog(3)
blocked.  This made ctdbd block as well.  In response, a form of
non-blocking, potentially lossy logging was implemented and it became
the default when logging to syslog.  The assumption here is that it is
more important for CTDB to continue operating than it is for all debug
messages to be logged, since you can't always have both.

Perhaps in the above situation there were other factors?  Perhaps lots
of other logging and other activity on the disk that the logs were
being written to?  Not sure...

Anyway, last year we re-implemented the non-blocking logging and also
reverted the default for syslog logging so that it just uses
syslog(3).  That should be fine for most uses, especially with some
level of throttling in rsyslogd.  However, there are various
non-blocking options available for clusters where a lot of logging
output is generated.  Which one is best?  Probably work your way down
the list and find one that produces usable output.  The difference is
mostly in what log message formats are supported by the logging daemon
whether it does something strange with certain formats.

peace & happiness,
martin