directory change notification patch
Mark Weaver
mark-clist at npsl.co.uk
Sat Mar 26 13:46:07 GMT 2005
Derrell.Lipman at UnwiredUniverse.com wrote:
> Mark Weaver <mark-clist at npsl.co.uk> writes:
>
>
>>- smbd code assumes that (change notify) signals are delivered via EINTR
>>when calling select. Since notification polling is disabled in the present
>>code if kernel change notification is active, the assumption would seem to
>>be that the notification is reliable.
>
>
> and
>
>
>>In summary (and I probably should have said this at the start), the change is
>>simply a fix to what appears to be the intended mechanism of signal delivery,
>>which is currently broken. You can test this quite easily for change notify
>>by having a script change a directory repeatedly and observing that "kernel
>>change notify on" is missing from the logs on occasion. Since change notify
>>is one-shot, this is somewhat of a disaster.
>
>
> These two statements may be at odds. If there is supposed to be a one-to-one
> correspondence between a change notification and a signal, EINTR does not
> guarantee that. Multiple signals can be merged into a single signal if they
> occur before being caught by the application. This is why Linux added "real
> time signals" which may be queued, and the number of generated signals is
> guaranteed to be what the application receives. Is a one-to-one
> correspondence between a change notification and a signal necessary?
>
Yes and no :) We have a few kinds of 'signal' knocking about now, so
I'll try and clarify those first:
- POSIX signals, which work in the normal way
- POSIX RT signals, which are used for kernel oplocks + change notifications
Samba consistently deals with signals by stuffing them down an internal
pipe (created on the first call to sys_select). Each signal handler
records information as appropriate (global variable or array) and then
calls sys_select_signal() to indicate that "some signal" has been
processed. The means of processing that signal is for the caller of
sys_select to receive EINTR, then proceed with asynchronous processing
and retry the select. I will call these 'samba signals' from now on to
be clear.
Since kernel change notification uses RT signals by definition (if there
aren't too many of them (/proc/sys/kernel/rtsig-max), they will not be
lost. Hence we can assume that we get one RT signal per change notify.
(If we do not, there is nothing that can be done other than to
increase the limit by mucking with rtsig-max. If the change
notification signals are not delivered by the kernel, that's that). So
in that sense, there must be a 1-1 correspondence between signals and
kernel change notifications.
When the 'samba signal' is 'delivered' via sys_select returning EINTR,
we have a slightly different case. Here, all that is actually assumed
is that some kind of asynchronous processing is required. smbd (which
is all I've looked at so far) deals with this by running the
async_processing function (process.c). This is required to check cases:
- Kernel oplock message waiting
- Change notify message waiting
- SIGHUP seen
All of those are through the pipe. Now this (nearly, see a bit later)
only happens if sys_select returns EINTR; that is at least 1 signal of
any kind has been received. Thus in this sense, only 1 'samba signal'
is actually required for change notifications to be processed. Thus in
fact signals lost due to the limited pipe size do not matter -- we only
require one signal.
Assume kernel oplocks are turned off for the rest of this (this is a
legitimate configuration, but they are a source of 'samba signals' that
would 'correct' the problem).
The code for processing kernel change notifications is called from
async_processing or from timeout processing. Now the kernel change
notification code (as currently it currently is) ignores the call from
timeout processing. As async_processing is only run when EINTR is
received, then in this configuration, if we lose EINTR, the kernel
change notification will not be processed until another signal comes along.
Another signal can come from another directory change (requires someone
to have posted one), or a SIGHUP. But those can be lost as well. It's
very easy to reproduce having just one directory monitored -- it would
be harder to see with multiple directories as it is less likely that all
of the changes would be lost. As you can see, this is not reliable.
And what I meant before by the change notification being a one shot is
that only one change in the directory is reported -- you then have to
register for monitoring again. Thus the impact in the case of one
directory, with one missed changed, is that the notification will never
be processed -- not good.
My assumption therefore from the way that the code is structured is that
'samba signals' ought to be delivered as reliably as possible. This is
what my change is intended to do.
>
>>- sys_select_intr is a wrapper function for sys_select that eats EINTR. This
>>also looses signals. I changed this function to simply call select()
>>directly and copied other wrapper code from sys_select. The rational was
>>that there there is no point waiting on the signal pipe if the result is
>>going to be tossed anyway.
>
>
> This seems to be *changing* the meaning of sys_select_intr(). Instead of
> ignoring EINTR, you are leaving them queued (in the pipe). The next up-to 256
> calls to sys_select() will return with EINTR (assuming the previous bug is
> fixed). That sounds like something to be cautious of.
>
This is inefficient for the way things currently work, but is OK. It is
better to have too many 'samba signals' than none at all.
sys_select_intr is used, for example, when doing a read with a timeout.
The code that does this is not prepared to deal with 'samba signals'
so they are better left queued up for later.
> (This does raise the question as to why 256 queued signals are required. I
> suspect there's a good reason for queuing a bunch of them, I'm just not seeing
> it right now. If they don't get lost, one should be enough. Was it an
> attempt to change the semantics of signals, and try to ensure a one-to-one
> correspondence between generation of a signal and delivery of same? It still
> doesn't guarantee that...???)
>
One 'samba signal' would certainly be sufficient in the smbd case.
More information about the samba-technical
mailing list