Any change of rsync using threads instead of fork?
jamie at shareable.org
Fri Dec 9 19:30:13 GMT 2005
Nelson H. F. Beebe wrote:
> List traffic today asks about changing rsync to use lightweight
> threads instead of heavyweight fork.
> Before rushing into building a threads version of rsync, please READ
> this recent article
You didn't post a link directly to the article, just to the gateway page.
The ACM requires a fee to read the article (or a pre-paid account). I
considered paying the fee, if it's not much, but it requires user
registration before it will even say what the fee is; I don't purchase
from sites which won't state the price up front.
>From the abstract:
> We provide specific arguments that a pure library approach, in which
> the compiler is designed independently of threading issues, cannot
> guarantee correctness of the resulting code.
That's correct. Any implementor of a sound thread library, and of
compilers used with such libraries, knows that there are memory
aliasing optimisations which defeat essential synchronisation barriers
in functions like `pthread_mutex_lock'. That is a cause of unsafe
code. This is not new knowledge, but perhaps needed a paper all the same.
_Real_ systems which implement the POSIX threads specification
(pthreads) are not implemented in that way. They all require, in some
way, something special of their compiler. Even if that's just a
guarantee that calling (at least some marked) external functions may
read and write all program data which can be reached from multiple
Programs which call longjmp() and setcontext() have similar issues.
Therefore unix compilers, and their optimisations, have to take into
account those sorts of things.
If a system claims it supports "POSIX threads" (and if it really
does), then you can rely on this. That doesn't mean there aren't
implementations with bugs, but the ones which really do conform (and
generally vendors do put the effort in), are quite safe to use.
A few auxiliary points:
1. The request is not to use "lightweight" threads for no reason, or
some vagueness of efficiency. It is because some C environments
_cannot do_ fork. The only way to implement rsync on those is
either using threads or state machines.
2. A thread-safe version of rsync does not mean that you have to use
threads on all platforms, only that it's an option. Indeed, if
rsync is to remain portable, it must continue to be compilable
without threads too. Maybe it should default to using fork, except
when fork is not available, and when testing.
4. I have been involved in the design of pthreads libraries on GNU/Linux.
I can assure you the various synchronisation primitives do what they
say they do, at least on a distro where the appropriate C compiler
is used with the appropriate library, and that it is possible to
build correct, safe code on real platforms using those primitives.
> (its author is also the co-author of the
> well-known, and widely used, gc (garbage collecting version of C
> malloc and C++ new) library
Yes, it is widely used. Did you know Boehm's gc library is
compiler-unsafe to a far greater degree than most thread libraries?
Still, it is widely used anyway :)
More information about the rsync