atomic transaction set option to rsync

Mon Jan 3 22:48:40 GMT 2005

On Mon, 3 Jan 2005 17:39:19 +0100 (CET), Dag Wieers dag-at-wieers.com wrote:

> Apparently a change of behaviour from rsync 2.5 to rsync 2.6 affected the
> way I worked. I provide RPM repositories that I mirror using rsync. It is
> important to have the repository meta-data in sync with the data otherwise
> people have errors using Yum or Apt.

> In the old days (with older rsyncs) I was able to influence the order in
> which my transaction set was processed by changing the order of
> directories I wanted rsync to mirror, so that the metadata was uploaded
> after most of the data and deletes were done at the end of the
> transaction.

Dag:

I love your RPM collection, it's unfortunate that a minor change in
rsync behavior has created problems for you.  The design of UNIX file
operations makes it hard to create an atomic transaction the size of a
normal rsync transfer.

> There are other ways to work around it, either by uploading in different
> steps (which is impractical in my scenario) or by using a staging area
> (which is impossible and impractical for large mirror sites).

Rsync does a good job of ensuring file-level coherence by using a
temporary file during the transfer and a quick rename to the original
at the end.  Unfortunately for you, this is only good for a single
file.  If this were done on a larger scale, it would serve as an
atomic transaction-- but then rsync is just using a staging area of
its own creation.  The same thing could be accomplished by manually
creating the staging area and only using rsync as the data transport. 
(Which is really what it's designed for.)

I don't see how uploading in different steps would be impractical. 
The most bulletproof way to do this would be to sync each rpm and
header file in one rsync session.  However, for your collection of
thousands of file pairs, this would indeed be impractical.  Breaking
it up into 10-20 sessions with several dozen file pairs each would be
practical and could be automated with some shell or Perl wizardry.

Another option you didn't mention would be to make use of LVM
snapshots to ensure that your repository is always internally
consistent even while you're in the middle of an rsync.  The
disadvantage would be some periodic unavailability while you removed
and re-created your snapshots.  (i.e. the FTP server is configured to
serve files from the read-only snap volumes, which need to be
unmounted and re-snapped when new files are uploaded.)

There may be some full site-replication applications out there making
use of rsync.  I suspect someone here on the list would know.  I've
always just created my own custom scripts for this.

Lastly, I suspect one of the rsync gurus here can probably comment on
the feasibility of at least providing an option to restore the version
2.5 behavior.

Thanks again for the RPMs.  I hope you can find a good solution to
your mirroring dilemma.

  -- Steve