atomic transaction set option to rsync

Dag Wieers dag at wieers.com
Mon Jan 3 23:35:08 GMT 2005


On Mon, 3 Jan 2005, Steve Bonds wrote:

> On Mon, 3 Jan 2005 17:39:19 +0100 (CET), Dag Wieers dag-at-wieers.com wrote:
>
> > There are other ways to work around it, either by uploading in different
> > steps (which is impractical in my scenario) or by using a staging area
> > (which is impossible and impractical for large mirror sites).
> 
> Rsync does a good job of ensuring file-level coherence by using a
> temporary file during the transfer and a quick rename to the original
> at the end.  Unfortunately for you, this is only good for a single
> file.  If this were done on a larger scale, it would serve as an
> atomic transaction-- but then rsync is just using a staging area of
> its own creation.  The same thing could be accomplished by manually
> creating the staging area and only using rsync as the data transport. 
> (Which is really what it's designed for.)

Well, that would be one solution. Upload everything with deterministic 
temporary files that rsync by default would ignore when mirroring. Then 
when the transaction set is finished, hardlink all files to the real name 
and finally remove all temporary files.

(Having deterministic temporary files is important so that if a 
transaction fails it can be continued similarly to how rsync is working 
already)

I think it could work, should not cause much overhead and could safe a lot 
of people headaches in situations as mine. (I do understand this is only a 
small number of the many uses of rsync though)

 
> I don't see how uploading in different steps would be impractical. 
> The most bulletproof way to do this would be to sync each rpm and
> header file in one rsync session.  However, for your collection of
> thousands of file pairs, this would indeed be impractical.  Breaking
> it up into 10-20 sessions with several dozen file pairs each would be
> practical and could be automated with some shell or Perl wizardry.

Well, it is impractical for several reasons.

 1. I don't control my mirrors, so unless rsync has this behaviour that 
    allows mirrors to enable this behaviour easily it's going to be very 
    hard to have mirrors do something special for me. I'm pretty sure I 
    don't have that authority. Unless it's just a switch they have to 
    enable.

 2. Breaking it up in several sessions is hard because I only use 
    passphrases. I don't allow password-less connections, nor do I
    sign packages automatically because I think it is important 
    security-wise. Such a change would slow down my ability to work 
    flexibly (I'm already restricted by some other tools and processes)


> Another option you didn't mention would be to make use of LVM
> snapshots to ensure that your repository is always internally
> consistent even while you're in the middle of an rsync.  The
> disadvantage would be some periodic unavailability while you removed
> and re-created your snapshots.  (i.e. the FTP server is configured to
> serve files from the read-only snap volumes, which need to be
> unmounted and re-snapped when new files are uploaded.)

Impossible for the same reason. I only manage a private server that my 
main mirror has access to. I don't even have control over that main 
mirror. So even when I have a complex solution for myself, it would still 
be impossible for others. Rsync, imho, is the best place to have this 
functionality as it already has all the information to make the correct 
decisions.


> There may be some full site-replication applications out there making
> use of rsync.  I suspect someone here on the list would know.  I've
> always just created my own custom scripts for this.

Well, any solution I can think of either requires the same functionality 
as rsync provides (of which I can think only sub-optimal implementations) 
or is using rsync a second time (which would be slowing down and requires 
me to type a password again).


> Thanks again for the RPMs.  I hope you can find a good solution to
> your mirroring dilemma.

Thanks.

--   dag wieers,  dag at wieers.com,  http://dag.wieers.com/   --
[all I want is a warm bed and a kind word and unlimited power]


More information about the rsync mailing list