atomic transaction set option to rsync

Dag Wieers dag at wieers.com
Tue Jan 4 03:45:23 GMT 2005


On Mon, 3 Jan 2005, Wayne Davison wrote:

> On Tue, Jan 04, 2005 at 02:51:23AM +0100, Dag Wieers wrote:
> > In the past I could say smt. like:
> > 
> > 	rsync -a dir1/ dir2/ user at rsync:/remote-dir/
> > 
> > and it would process first dir1 and then dir2.
> 
> The filenames read in from dir1 and dir2 have always been sorted into a
> single list of files, so dir1's files will only be sent prior to dir2's
> files if they sort alphabetically earlier in the list.
> 
> > This way I first added the packages/ dir (which contains hardlinks of only 
> > the packages) and then the repository (packages+metadata).
> 
> Ahh, now there's a difference that is affected by the 2.6.x series:
> hard-link handling.  If the first instance of a hard-link is not found,
> rsync holds off on sending the file in the hopes that one of the other
> links for the file will match up with an existing file on the receiving
> side.  This avoids a bug where a new hard-link can cause rsync to
> re-send all the file's data just because it sorted alphabetically
> earlier in the list than the other (existing) link(s).

Ok, it doesn't really matter what caused the change in behaviour. I now I 
was using something that was not advertized as such (or was not set in 
stone) so I was not surprised it happened. The new functionality would be 
much better and safer than how I did it before and would actually work for 
other mirrors too and is something people could rely on. 


> > I can ask mirrors to use '--atomic' or '--atomic-ts', bt I can't ask
> > them to re-organise their mirror-scripts just for me.
> 
> Since you'd have to ask them to install a new rsync, maybe just ask them
> to install the attached perl script instead.  Then, they could run
> "atomic-rsync ..." instead of their current "rsync ..." command.  (The
> attached script works if they're doing a pull.)

Well, having a newer rsync will be mandatory the next security update :) 
And if someone would have added this 5 years ago, this was no issue. So 
lets hope it is added sooner than later (if accepted).


> The idea of doing a massive number of renames a the end of the transfer
> is interesting, but it is not as atomic as the algorithm implemented by
> the above script.  However, if you'd prefer going that route, I'd
> imagine the implementation sharing a lot of the code that --partial-dir
> uses.  E.g., add an --atomic-dir=.atomic option that causes all finished
> files to be saved off in the .atomic dir (relative to their destination)
> and then add an ending pass that goes back through the file list and
> renames all the .atomic/FOO files.  Something like that should be pretty
> easy to whip up.

Well, the reason why I think a new feature in rsync makes sense, is 
because it does not need an extra directory. I would like to get rid of 
the optional directory, as I would like to make it possible that 2 rsyncs 
are happening at the same time and that it becomes just another flag to 
add to a script instead of some new logic to make the optional directory 
uniq.

I looked at the code, but it only reminded me how little developed my C 
knowledge is.

The changes I think are required is:

	Add an --atomic option as boolean and check if the combination of 
	options make any sense at all :) (in options.c)

	Make the recv_files() function understand it has to delay its 
	finish_transfer() call. Save some of the information required to 
	make a decisive call in some transaction struct. (like the 
	temporary name) (in receiver.c)

	Then at the end of recv_files() before delete_files() run 
	finish_transaction() on the transaction struct that calls 
	finish_transfer() when necessary. (in receiver.c)

If we want to speed up this last step it may make sense to split off the 
renaming and the permission/owner changes ?

I hope someone can pick this up that has better C skills or knows the 
code much better than me.


PS The importance of this change makes it less likely someone is starting 
an rsync between the time I've started uploading the metadata and 
finished my rsync.

>From somewhere between 8 hours to 1 hour based on the transaction size 
(350MB to 50MB), to something likely less than 10 secs based on the amount 
of transaction objects needing a move. (avg. 100).

Bringing this 10 secs down to 1 secs or less (or real atomic by swapping a 
directory) is less important. Especially when it has other drawbacks.

--   dag wieers,  dag at wieers.com,  http://dag.wieers.com/   --
[all I want is a warm bed and a kind word and unlimited power]


More information about the rsync mailing list