rsync in-place (was Re: rsync 1tb+ each day)
craig at atheros.com
Wed Feb 5 23:41:22 EST 2003
> > Is it possible to tell rsync to update the blocks of the target file=20
> > 'in-place' without creating the temp file (the 'dot file')? I can=20
> > guarantee that no other operations are being performed on the file at=20
> > the same time. The docs don't seem to indicate such an option.
> No, it's not possible, and making it possible would require a deep
> and fundamental redesign and re-implementation of rsync; the result
> wouldn't resemble the current program much.
I disagree. An --inplace option wouldn't be too hard to implement.
The trick is that when --inplace is specified the block matching
algorithm (on the sender) would only match blocks at or after that
block's location (on the receiver). No protocol change is required.
The receiver can then operate in-place since no matching blocks are
earlier in the file. This could be relaxed to allow a fixed number
of earlier blocks, based on the knowledge the receiver will buffer
reads. But that is more risky. Caveat user: if you specify --inplace
and the source file has a single byte added to the beginning then the
entire file will be sent as literal data.
Of course, a major issue with --inplace is that the file will be
in an intermediate state if rsync is killed mid-transfer. Rsync
currently ensures that every file is either the original or new.
Another independent optimization would be to do lazy writes. Currently,
if you specify -I (--ignore-times) the output file is written (to a tmp
file and then renamed) even if the contents are identical. Instead,
creation of the tmp file could be delayed until the output file is
known to be different. This is detected either by an out-of-sequence
block number from the sender, or any literal data. If the file contains
only in-sequence block numbers and no literal data, then there is no
need to write anything.
More information about the rsync