Patch for rsyncable zlib with new rolling checksum
kevin at trumpetinc.com
Fri Feb 18 02:03:55 GMT 2005
My test results so far indicate a pretty decent improvement in overall rsync performance when using a slightly more sophisticated checksum calculation.
The attached patch has the required changes (in hindsight, I should have compressed this using zlib with the new algorithm :-) ).
Some things to know about the patch:
First, it is against the zlib library - NOT the gzip application.
By default, rsyncable computations are turned on, and the default behavior is to use the new rolling checksum algorithm. The window and reset block sizes are set to 30 bytes and 4096 bytes respectively. I've found that this gets much better rsync performance when used with the Z_RSYNCABLE_RSSUM checksum algorithm. If you want to play with the Z_RSYNCABLE_SIMPLESUM, and you want to keep your window sizes small, be sure you run several different window sizes - you'll be amazed at how much the compression ratio and rsync performance vary for small window sizes with that algorithm. With Z_RSYNCABLE_RSSUM, the compression ratios and rsync performance are quite well behaved, even for block sizes down to 10 or 15 - but 30 seems like a safe value for the time being.
In my test runs, I'm seeing approximately 20-30% improvement in the total number of changed bytes identified by the rsync algorithm, without any impact on the zlib compression ratio as compared to the simpler rolling checksum algorithm. Your results, of course, may vary :-)
This patch includes the patch for adding rsyncable behavior, plus my changes. If you just want the basic patch without my changes, it is located at https://svn.uhulinux.hu/packages/dev/zlib/patches/02-rsync.patch
You can configure the rsyncable behavior (which checksum to use, window size and block size) dynamically (instead of adjusting the #define lines at the beginning of defelate.c) by calling the deflateSetRsyncParameters() function immediately following stream initialization, and before writing anything to the stream. This is good if you want to play with parametric studies, etc...
If you set the rolling checksum algorithm to Z_RSYNCABLE_OFF, you will get the exact behavior as zlib without the patch - it will be a hair slower, but compared to the rest of what's going on in zlib, the overhead of this should be quite negligible.
I'd love to hear feedback/comments!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 28163 bytes
Desc: not available
Url : http://lists.samba.org/archive/rsync/attachments/20050217/5b6afa8d/rsyncable_checksum.obj
More information about the rsync