directory change notification patch

Sat Mar 26 13:46:07 GMT 2005

Derrell.Lipman at UnwiredUniverse.com wrote:
> Mark Weaver <mark-clist at npsl.co.uk> writes:
> 
> 
>>- smbd code assumes that (change notify) signals are delivered via EINTR
>>when calling select.  Since notification polling is disabled in the present
>>code if kernel change notification is active, the assumption would seem to
>>be that the notification is reliable.
> 
> 
> and
> 
> 
>>In summary (and I probably should have said this at the start), the change is
>>simply a fix to what appears to be the intended mechanism of signal delivery,
>>which is currently broken.  You can test this quite easily for change notify
>>by having a script change a directory repeatedly and observing that "kernel
>>change notify on" is missing from the logs on occasion.  Since change notify
>>is one-shot, this is somewhat of a disaster.
> 
> 
> These two statements may be at odds.  If there is supposed to be a one-to-one
> correspondence between a change notification and a signal, EINTR does not
> guarantee that.  Multiple signals can be merged into a single signal if they
> occur before being caught by the application.  This is why Linux added "real
> time signals" which may be queued, and the number of generated signals is
> guaranteed to be what the application receives.  Is a one-to-one
> correspondence between a change notification and a signal necessary?
> 
Yes and no :)  We have a few kinds of 'signal' knocking about now, so 
I'll try and clarify those first:

- POSIX signals, which work in the normal way
- POSIX RT signals, which are used for kernel oplocks + change notifications

Samba consistently deals with signals by stuffing them down an internal 
pipe (created on the first call to sys_select).  Each signal handler 
records information as appropriate (global variable or array) and then 
calls sys_select_signal() to indicate that "some signal" has been 
processed.  The means of processing that signal is for the caller of 
sys_select to receive EINTR, then proceed with asynchronous processing 
and retry the select.  I will call these 'samba signals' from now on to 
be clear.

Since kernel change notification uses RT signals by definition (if there 
aren't too many of them (/proc/sys/kernel/rtsig-max), they will not be 
lost.  Hence we can assume that we get one RT signal per change notify. 
  (If we do not, there is nothing that can be done other than to 
increase the limit by mucking with rtsig-max.  If the change 
notification signals are not delivered by the kernel, that's that).  So 
in that sense, there must be a 1-1 correspondence between signals and 
kernel change notifications.

When the 'samba signal' is 'delivered' via sys_select returning EINTR, 
we have a slightly different case.  Here, all that is actually assumed 
is that some kind of asynchronous processing is required.  smbd (which 
is all I've looked at so far) deals with this by running the 
async_processing function (process.c).  This is required to check cases:

- Kernel oplock message waiting
- Change notify message waiting
- SIGHUP seen

All of those are through the pipe.  Now this (nearly, see a bit later) 
only happens if sys_select returns EINTR; that is at least 1 signal of 
any kind has been received.   Thus in this sense, only 1 'samba signal' 
is actually required for change notifications to be processed.  Thus in 
fact signals lost due to the limited pipe size do not matter -- we only 
require one signal.

Assume kernel oplocks are turned off for the rest of this (this is a 
legitimate configuration, but they are a source of 'samba signals' that 
would 'correct' the problem).

The code for processing kernel change notifications is called from 
async_processing or from timeout processing.  Now the kernel change 
notification code (as currently it currently is) ignores the call from 
timeout processing.  As async_processing is only run when EINTR is 
received, then in this configuration, if we lose EINTR, the kernel 
change notification will not be processed until another signal comes  along.

Another signal can come from another directory change (requires someone 
to have posted one), or a SIGHUP.  But those can be lost as well.  It's 
very easy to reproduce having just one directory monitored -- it would 
be harder to see with multiple directories as it is less likely that all 
of the changes would be lost.  As you can see, this is not reliable.

And what I meant before by the change notification being a one shot is 
that only one change in the directory is reported -- you then have to 
register for monitoring again.  Thus the impact in the case of one 
directory, with one missed changed, is that the notification will never 
be processed -- not good.

My assumption therefore from the way that the code is structured is that 
'samba signals' ought to be delivered as reliably as possible.  This is 
what my change is intended to do.

> 
>>- sys_select_intr is a wrapper function for sys_select that eats EINTR. This
>>also looses signals.  I changed this function to simply call select()
>>directly and copied other wrapper code from sys_select.  The rational was
>>that there there is no point waiting on the signal pipe if the result is
>>going to be tossed anyway.
> 
> 
> This seems to be *changing* the meaning of sys_select_intr().  Instead of
> ignoring EINTR, you are leaving them queued (in the pipe).  The next up-to 256
> calls to sys_select() will return with EINTR (assuming the previous bug is
> fixed).  That sounds like something to be cautious of.
> 
This is inefficient for the way things currently work, but is OK.  It is 
better to have too many 'samba signals' than none at all. 
sys_select_intr is used, for example, when doing a read with a timeout. 
  The code that does this is not prepared to deal with 'samba signals' 
so they are better left queued up for later.

> (This does raise the question as to why 256 queued signals are required.  I
> suspect there's a good reason for queuing a bunch of them, I'm just not seeing
> it right now.  If they don't get lost, one should be enough.  Was it an
> attempt to change the semantics of signals, and try to ensure a one-to-one
> correspondence between generation of a signal and delivery of same?  It still
> doesn't guarantee that...???)
> 
One 'samba signal' would certainly be sufficient in the smbd case.