Rsync help needed...

Matt McCutchen hashproduct at verizon.net
Thu Feb 23 00:13:05 GMT 2006


On Wed, 2006-02-22 at 11:43 -0800, lsk wrote:
> lsk: This is oracle database the header information(timestamp..etc) on each
> datafile constantly changes which might be very small change but the data
> inside most of the datafiles are same they wont change much. New oracle
> datafiles will be added on the  source which needs to be transferred. So
> what we do is every 2 weeks to refresh target server  we remove all
> datafiles on target and rsync all datafiles again from source. My question
> is if we leave the older datafiles and rsync will it be faster ?

Yes, very much faster!  Rsync has a nice incremental transfer algorithm,
which is described in tech_report.tex in the source distribution.  The
sending rsync will notice that large regions of the target file match
large regions of the source file by comparing hash codes and will only
send the parts of the source file that are new or different.  Rsync will
notice that a region of one file is identical to a region of the other
even if those regions occur in different places in the files.  However,
if Oracle does some kind of re-indexing or garbage collection that
modifies every Nth byte of the on-disk data file, rsync will transfer
the whole file from scratch.

> Currently I use "rsync -czv" c for checksum.

If each data file's first few bytes ("header information") change
between rsync transfers, then --checksum buys you nothing.  Normally
rsync will skip transferring a file if the receiver has a corresponding
file of the same size and modtime; --checksum makes both sides read
their files entirely and skip transfers based on the MD4 of the file's
data.  However, since the header information changes, the checksums will
never match and rsync will transfer the files, but the transfers will be
fast because so little of the files changed.

On the other hand, if the modtimes of the files are changing but their
data is not, using --checksum will result in a relatively small
reduction in network traffic: rsync will notice the files match
immediately instead of after comparing a few kilobytes worth of block
hashes.  However, rsync will read each changed file once to compute the
checksum and again to transfer it, so the disk hit increases.

Someone correct me if I am mistaken about how incremental transfer and
--checksum work.
-- 
Matt McCutchen
hashproduct at verizon.net
http://hashproduct.metaesthetics.net/



More information about the rsync mailing list