Rsync help needed...

Matt McCutchen hashproduct at verizon.net
Fri Feb 24 21:54:12 GMT 2006


On Fri, 2006-02-24 at 11:08 -0800, lsk wrote:
> ////****/// lsk:- Thanks for the clarification Wayne, in my case no one
> would be allowed to use the destination file until the process is
> complete. As soon as my destination server is upgraded to the newer
> version of rsync which supports --inplace option I am going to try with
> in-place and -no-whole-file option without -- checksum since rsync
> algorithm does it.

--no-whole-file (i.e. incremental transfer) is the default except for
local transfers, during which disk I/O is more of a limiting factor than
network I/O; you need not specify it explicitly.

> Wayne I have one more question regarding -- checksum, I don't do rsync at
> directory level I do datafile by datafile so does this option of
> --checksum is useful only if I did at a directory level rysnc to check all
> files in the directory *before* transfer to determine which files need to
> be transferred. If I do at file level since it does the checksum by itself
> I don't need to explicitly specify am I right ?

Excuse the length of this explanation, but I'm hoping it will answer
others' questions about --checksum and the quick check in the future...

The checksum after a file is transferred (the one rsync does "by
itself") is completely independent of the checksum enabled with
--checksum that decides whether to send a file.  Whether you send
individual files or a whole directory worth of files, --checksum will
enable an extra checksumming pass at the beginning, in addition to the
post-transfer checksums on whatever files are actually sent.

Rsync does a "quick check" on the sender and receiver versions of a file
to decide whether to transfer the file's data.  If the quick check
determines that the data portions are probably identical, rsync
transfers the attributes you told it to preserve and moves on.  If not,
rsync transfers the sender's data portion, using incremental transfer or
whole-file transfer as appropriate, to the receiver's file or to a
temporary file on the receiver (depending on --inplace).

(Pedantic note: I don't say simply "rsync algorithm" because it seems
ambiguous.  rsync uses many algorithms, the most famous of which is its
incremental transfer algorithm.)

You have a choice of several criteria for the quick check.  By default,
two files are probably identical if their sizes and mtimes match; this
criterion is usually good enough.  With --size-only, two files are
probably identical if their sizes match.  With --time-only (provided by
the experimental filter patch), two files are probably identical if
their mtimes match.  With --checksum, two files are probably identical
if their MD4 checksums match.  Finally, with --ignore-times, the quick
check always concludes two files are probably not identical; rsync
transfers all files.

--checksum has two uses, which apply equally to single files and
directories full of files:

* To reduce the number of transfers when two conditions hold: disk
reading is very cheap compared to network I/O, and many files' mtimes
differ between the two sides even though their data is the same on both
sides.

* To make sure files are transferred when their data differs but their
sizes and mtimes are the same.  This doesn't usually happen, but when it
does, the default quick check would erroneously skip the files.

In your situation, --checksum buys you nothing but a lot of extra disk
reading: since the first few bytes of your data files are being modified
so frequently, the pre-transfer checksums will never match and the
transfer will always happen.  Since rsync performs a post-transfer
checksum on any file whose data it modifies, --checksum is _not_
necessary to guarantee that rsync transfers files without corruption.

> I have been using the following syntax..
> 
> rsync -cvz /d01/app/testfile1.dbf  tarser:/t01/app/testfile1.dbf
> 
> but I would change to the one below and test a 40 GB transfer and see the
> results...
> 
> rsync -zv --no-whole-file --stats /d01/app/testfile2.dbf 
> tarser:/t01/app/testfile2.dbf

Yes, make that change!  You don't need --no-whole-file; it's the default
because you're doing a remote transfer.  Leaving the old file on the
receiver and omitting --checksum has already brought down the transfer
time significantly in your earlier test; I bet --inplace will cut the
time another 20-40% or so.
-- 
Matt McCutchen
hashproduct at verizon.net
http://hashproduct.metaesthetics.net/



More information about the rsync mailing list