Batch mode creates huge diffs, bug(s)?

Matt Van Mater matt.vanmater at gmail.com
Wed Mar 21 08:22:02 MDT 2012


OK so I re-ran rdiff with several different block sizes against my two
image files described earlier to try and find the optimal size with my use
case.  Here is a summary:
    64 byte block - signature size: 3199208028, delta size: 543335307
  256 byte block - signature size:   799802016, delta size: 267816685
  512 byte block - signature size:   399901020, delta size: 220323243
1024 byte block - signature size:   199950516, delta size: 272722422
2048 byte block - signature size:     99975264, delta size: 446384815
4096 byte block - signature size:     49987644, delta size: 830846129

At the risk of being obvious, this test showed that the smaller the block
size specified, the longer the process takes to compute the signatures and
deltas.  It also validated tests done in rsync that show 512 is the best
block size in my use case of transferring a Windows XP system image.

Most importantly, it clearly demonstrates that the rsync library is
effective but there is a bug in rsync batch mode (both --write-batch and
--only-write-batch) when using very large files.

What is the best way to file a bug report on this issue?  Is there any
other information needed, is there anyone reading this thread who is still
not convinced there is a problem with rsync?

Thanks
Matt Van Mater

On Tue, Mar 20, 2012 at 4:26 PM, Matt Van Mater <matt.vanmater at gmail.com>wrote:

> I ran one more test on a separate VM to check and see if rsync would allow
> me to specify block size for a smaller file while using batch mode... it
> works.  To me that indicates that rsync has a problem processing very large
> batch files, especially when you specify a particular block size.  More
> signs point to an rsync bug...
>
> Here are the commands to reproduce the 'success':
> root at server:/images# echo foo > file1
> root at server:/images# echo foo > file2
> root at server:/images# echo bar >> file2
> root at server:/images# rsync --only-write-batch=batch-2to1-512 --block
> size=512 file2 file1
> root at server:/images# ./batch-2to1-512.sh
> root at server:/images# md5sum file1
> f47c75614087a8dd938ba4acff252494  file1
> root at server:/images# md5sum file2
> f47c75614087a8dd938ba4acff252494  file2
>
> I then increased the RAM on the VM to 32GB and then 64 GB and tried the
> command shown above on the big 16 GB files and it still fails with the same
> error as i originally reported.  So even though rsync could fit the entire
> source, destination and diff file in RAM it still failed.
>
> Still waiting on the rdiff with tiny block size to complete :)
>
> Matt
>
>
> On Tue, Mar 20, 2012 at 4:09 PM, Joachim Otahal (privat) <Jou at gmx.net>wrote:
>
>>  Matt Van Mater schrieb:
>>
>> Let me restate my last email regarding rdiff:
>>
>> All of my image files are from the same Windows XP VM, created using
>> FOG/partimage.  Image1 is the "baseline", Image2 is Image1 + the WinSCP
>> binary downloaded (not even installed).
>>
>>
>> Use your virtualisation to create the difference / snapshot, and transfer
>> what your virtualisation spits out in the diff disk image, and the let it
>> merge on the target.
>> How well that works depends on the bitchiness of your virtualisation *g*.
>>
>> Good luck!
>>
>> Joachim Otahal
>>
>>
>>
>> I am not imaging an Ubuntu machine.  I am using the Ubuntu machine as a
>> means of creating the batch file for rsync and/or rdiff.  I chose that
>> platform since it is a common distribution used by many and would be easy
>> for others to reproduce my problem.
>>
>> I agree the 400 MB still looks big, but no the ONLY intentional
>> difference between image1 and image2 is the 2.9 MB WinSCP binary i
>> downloaded.  My guess is the difference is 1) due partially to the default
>> block size rdiff uses (512b?) AND 2) the fact that the Windows XP VM image
>> source only had 256 MB RAM and that by default Windows XP creates a
>> pagefile of 1.5 x RAM size = 384 MB.  That is close enough to 400 MB for me.
>>
>> I am currently running rdiff with a smaller blocksize to test #1 above,
>> hopefully that will force the delta to get smaller (at the expense of
>> longer computation time).
>>
>> Matt
>>
>> On Tue, Mar 20, 2012 at 3:41 PM, Joachim Otahal (privat) <Jou at gmx.net>wrote:
>>
>>> Matt Van Mater schrieb:
>>>
>>>
>>>> Alternate assessment - I ran a similar comparison against the two image
>>>> files using rdiff that comes with Ubuntu 10.04.4 LTS (shown up as librsync
>>>> 0.9.7) and have a significantly smaller delta file (closer to what i
>>>> expect).
>>>>
>>>
>>>  Just plain luck. If ubuntu wrote the most new files close to the last
>>> used blocks and only changes a few bytes (this time literally) in the
>>> middle then the desync happens later. The 400 MB delta still looks big, or
>>> did you install something big like libreoffice?
>>>
>>> regards,
>>>
>>> Joachim Otahal
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20120321/7a684d95/attachment.html>


More information about the rsync mailing list