Why does one of there work and the other doesn't

Phil Howard phil-rsync at ipal.net
Tue Dec 4 14:46:38 EST 2001


On Mon, Dec 03, 2001 at 09:55:53AM -0700, tim.conway at philips.com wrote:

| rsync already has a memory-hogging issue.  Imagine having it search your 
| entire directory tree, checksumming all files, storing and sending them 
| all, comparing both lists looking for matching date/time/checksums to 
| guess where you've moved files to.  You'd be better off to use a wrapper 
| the tools you move files with, keeping a replayable log, and have your 
| mirrors retrieve and replay that log, before doing the rsync.

I don't think so.  I would like to have that kind of smart capability be
fully integrated into a useful tool.  And rsync already has most of the
pieces such a thing would need in place.  I am NOT suggesting that it be
the default.  As you say, it would be memory hogging.  But it is already
memory hogging now, and adding a checksum for every file in the tree would
be 32 bytes more per file.

In some cases I definitely want LESS memory hogging, such as replicating
trees of millions of files.  In other cases I do want the checksumming to
get LESS files redundantly transferred.

What I have done in the past to accomplish it is to build a tar file of the
entire tree on both sides, then sync the tar files making sure the rsync
blocksize matches correctly.  That still takes a lot of time because rsync
is sending a LOT of checksum for small blocks.  If I could get tar to build
the tar file with the files on very large block boundaries, then I could
specify a larger blocksize to rsync and do the transfer much faster.  But
it would make just as much sense to just send a checksum per file, and, in
cases where a whole file checksum matches (though at a different name on
the destination) to copy, hardlink, or move (as appropriate) the file to
the new location.

Inventing a whole new tool to do this when rsync has most of the logic of
it in place is absurd.  I just don't understand the actual rsync internals
or protocol enough to accomplish such a patch myself, so my only option is
to offer the suggestion and hope someone likes it.  Again, I am not
suggesting that it be the default option, so it would nt impact anyone
unless they wanted it to.

-- 
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| phil-nospam at ipal.net | Texas, USA | http://phil.ipal.org/     |
-----------------------------------------------------------------




More information about the rsync mailing list