rsync in-place (was Re: rsync 1tb+ each day)
ewhiting at amis.com
Wed Feb 5 13:55:44 EST 2003
jw schultz wrote:
> I was thinking more in terms of no block relocation at all.
> Checksums only match if at the same offset. The receiver simply
> discards (or never gets) info about blocks that are
> unchanged. It would just lseek and write with a possible
> truncate at the end.
This would seem to help a lot on larger database files. Why look at a
700 byte block of data from a source file and try to find a matching
block by fully scanning block checksums at all offsets in a 8G
destination datafile? And then doing it again for every 700 bytes? (I
read the rsync technical paper -- but I might be confused)
In the case of Oracle data files the only place a meaningful/syncable
delta will occur is at the same offset. Yes this is a special case --
but it has the potential to really help in rsyncing oracle datafiles
during a hotbackup or when syncing from a snapshot to nearstore storage.
This approach should be faster than the -W option for very large Oracle
datafiles (which often have small amounts of changed blocks). It should
also be faster than deleting the destination files and resending (-W)
like has been suggested.
> > You can imagine a smarter algorithm that does non-sequential writes to
> > the output so as to avoid writing over blocks that will be needed
> > later. Alternatively, if you assume some amount of temporary storage,
> > then it might be possible to still produce output as a stream.
> I really doubt it is worthwhile doing to rsync. This
> principly applies to block oriented files such as devices
> and database files. For the most part rsync handles these
The original post still raises an interesting issue -- it should not be
faster to remove destination files before running rsync. That is counter
to one of the main purposes of rsync -- efficiently detect and send
only the deltas.
More information about the rsync