combining --preallocate and --fuzzy

John Taylor john at fcs.uga.edu
Thu Apr 3 19:49:27 GMT 2008


Greetings,

I would like to write a patch for rsync but need some help getting
started.  Here is my situation.  I am using cwrsync to copy files from
one Windows server to another Windows server.  One file that I need
to backup is 130 GB.  The daily changes occur all throughout the file,
not just at the end of the file.  File names look like this:

Db_20080402_0003_DB.BAK
Db_20080403_0003_DB.BAK

Therefore, I can use the .fuzzy switch as a basis for rsync.  I want to
take this one step farther.

What I would like to accomplish is the merging of the .preallocate patch
and the .fuzzy option.  When these 2 switches are used together (at least
on Windows platforms), I want rsync to determine which fuzzy file to use
(from the fuzzy_distance() function), but then go ahead and preallocate
the new file, not with posix_fallocate(), but with the contents of the
file matched from fuzzy_distance().  This would keep the destination
file from being severely fragmented.

What I see happening is rsync doing a local file copy first, but not
block by block.  When you copy a file on Windows, with the copy command,
the entire file (size) is preallocated so that no fragmentation occurs.
After this step, the copy command  performs the actual transfer of data
from the source file to the destination file.  You could think of this
almost as a local, non-fragmenting file copy operation, and then the
rsync algorithm is used to update the new, destination file.

How hard would it be to write this patch?  I don.t mind doing the coding,
but would love to hear some strategies on how to accomplish this goal.
Congratulations on getting version 3 released!

Thanks,
-John Taylor



More information about the rsync mailing list