atomic transaction set option to rsync
Dag Wieers
dag at wieers.com
Mon Jan 3 23:35:08 GMT 2005
On Mon, 3 Jan 2005, Steve Bonds wrote:
> On Mon, 3 Jan 2005 17:39:19 +0100 (CET), Dag Wieers dag-at-wieers.com wrote:
>
> > There are other ways to work around it, either by uploading in different
> > steps (which is impractical in my scenario) or by using a staging area
> > (which is impossible and impractical for large mirror sites).
>
> Rsync does a good job of ensuring file-level coherence by using a
> temporary file during the transfer and a quick rename to the original
> at the end. Unfortunately for you, this is only good for a single
> file. If this were done on a larger scale, it would serve as an
> atomic transaction-- but then rsync is just using a staging area of
> its own creation. The same thing could be accomplished by manually
> creating the staging area and only using rsync as the data transport.
> (Which is really what it's designed for.)
Well, that would be one solution. Upload everything with deterministic
temporary files that rsync by default would ignore when mirroring. Then
when the transaction set is finished, hardlink all files to the real name
and finally remove all temporary files.
(Having deterministic temporary files is important so that if a
transaction fails it can be continued similarly to how rsync is working
already)
I think it could work, should not cause much overhead and could safe a lot
of people headaches in situations as mine. (I do understand this is only a
small number of the many uses of rsync though)
> I don't see how uploading in different steps would be impractical.
> The most bulletproof way to do this would be to sync each rpm and
> header file in one rsync session. However, for your collection of
> thousands of file pairs, this would indeed be impractical. Breaking
> it up into 10-20 sessions with several dozen file pairs each would be
> practical and could be automated with some shell or Perl wizardry.
Well, it is impractical for several reasons.
1. I don't control my mirrors, so unless rsync has this behaviour that
allows mirrors to enable this behaviour easily it's going to be very
hard to have mirrors do something special for me. I'm pretty sure I
don't have that authority. Unless it's just a switch they have to
enable.
2. Breaking it up in several sessions is hard because I only use
passphrases. I don't allow password-less connections, nor do I
sign packages automatically because I think it is important
security-wise. Such a change would slow down my ability to work
flexibly (I'm already restricted by some other tools and processes)
> Another option you didn't mention would be to make use of LVM
> snapshots to ensure that your repository is always internally
> consistent even while you're in the middle of an rsync. The
> disadvantage would be some periodic unavailability while you removed
> and re-created your snapshots. (i.e. the FTP server is configured to
> serve files from the read-only snap volumes, which need to be
> unmounted and re-snapped when new files are uploaded.)
Impossible for the same reason. I only manage a private server that my
main mirror has access to. I don't even have control over that main
mirror. So even when I have a complex solution for myself, it would still
be impossible for others. Rsync, imho, is the best place to have this
functionality as it already has all the information to make the correct
decisions.
> There may be some full site-replication applications out there making
> use of rsync. I suspect someone here on the list would know. I've
> always just created my own custom scripts for this.
Well, any solution I can think of either requires the same functionality
as rsync provides (of which I can think only sub-optimal implementations)
or is using rsync a second time (which would be slowing down and requires
me to type a password again).
> Thanks again for the RPMs. I hope you can find a good solution to
> your mirroring dilemma.
Thanks.
-- dag wieers, dag at wieers.com, http://dag.wieers.com/ --
[all I want is a warm bed and a kind word and unlimited power]
More information about the rsync
mailing list