Batch mode creates huge diffs, bug(s)?

Matt Van Mater matt.vanmater at gmail.com
Wed Mar 21 11:02:03 MDT 2012


OK, I think I found the source of my problem...

One bug I found is due to the very large size of the file I am transferring
(16993256652 image2 --> 17062442700 image1) combined with specifying a
small-ish block size (512).  The other bug I am unsure of the cause.

   1. Example 1 (works, but results in HUGE batch file size)
      1. root at ubuntu10_04_4:~# rsync
      --only-write-batch=img1-to-img2-defaultblock image2 image1
      2. root at ubuntu10_04_4:~#ls -la | awk '/defaultblock/ {print $5 $8}
         1. 7315408780 img1-to-img2-defaultblock
         2. Example 2 (works, and results in a reasonable but not optimal
   batch file size)
      1. root at ubuntu10_04_4:~# rsync --block-size=1024
      --only-write-batch=img1-to-img2-1024block image2 image1
      2. root at ubuntu10_04_4:~#ls -la | awk '/1024block/ {print $5 $8}
         1. 287002289 img1-to-img2-1024block
         3. Example 3 (does NOT work, no batch file created)
      1. root at ubuntu10_04_4:~# rsync --block-size=512
      --only-write-batch=img1-to-img2-512block image2 image1
         1. ERROR: out of memory in receive_sums [sender]
         2. rsync error: error allocating core memory buffers (code 22) at
         util.c(117) [sender=3.0.7]

I am NOT a skilled C programmer and the following is a bit of a guess, but
I think I have traced the bug back to sender.c in the function starting at
line 59: static struct sum_struct *receive_sums(int f)

Something in those 59 lines of code does not like having to track so many
blocks/signatures.  I think it is the use of the int data type, and perhaps
using int32 might solve my problem?

So I believe I have identified two distinct, but related bugs:
1) The use of the default block size when used in conjunction with batch
mode results in an inexplicably large batch file size
2) The use of smaller block sizes with large files in conjunction with
batch mode results in a failure, increasing block sizes to a larger value
results in a success

Thoughts?

Matt Van Mater


On Wed, Mar 21, 2012 at 10:22 AM, Matt Van Mater <matt.vanmater at gmail.com>wrote:

> OK so I re-ran rdiff with several different block sizes against my two
> image files described earlier to try and find the optimal size with my use
> case.  Here is a summary:
>     64 byte block - signature size: 3199208028, delta size: 543335307
>   256 byte block - signature size:   799802016, delta size: 267816685
>   512 byte block - signature size:   399901020, delta size: 220323243
> 1024 byte block - signature size:   199950516, delta size: 272722422
> 2048 byte block - signature size:     99975264, delta size: 446384815
> 4096 byte block - signature size:     49987644, delta size: 830846129
>
> At the risk of being obvious, this test showed that the smaller the block
> size specified, the longer the process takes to compute the signatures and
> deltas.  It also validated tests done in rsync that show 512 is the best
> block size in my use case of transferring a Windows XP system image.
>
> Most importantly, it clearly demonstrates that the rsync library is
> effective but there is a bug in rsync batch mode (both --write-batch and
> --only-write-batch) when using very large files.
>
> What is the best way to file a bug report on this issue?  Is there any
> other information needed, is there anyone reading this thread who is still
> not convinced there is a problem with rsync?
>
> Thanks
> Matt Van Mater
>
>
> On Tue, Mar 20, 2012 at 4:26 PM, Matt Van Mater <matt.vanmater at gmail.com>wrote:
>
>> I ran one more test on a separate VM to check and see if rsync would
>> allow me to specify block size for a smaller file while using batch mode...
>> it works.  To me that indicates that rsync has a problem processing very
>> large batch files, especially when you specify a particular block size.
>> More signs point to an rsync bug...
>>
>> Here are the commands to reproduce the 'success':
>> root at server:/images# echo foo > file1
>> root at server:/images# echo foo > file2
>> root at server:/images# echo bar >> file2
>> root at server:/images# rsync --only-write-batch=batch-2to1-512 --block
>> size=512 file2 file1
>> root at server:/images# ./batch-2to1-512.sh
>> root at server:/images# md5sum file1
>> f47c75614087a8dd938ba4acff252494  file1
>> root at server:/images# md5sum file2
>> f47c75614087a8dd938ba4acff252494  file2
>>
>> I then increased the RAM on the VM to 32GB and then 64 GB and tried the
>> command shown above on the big 16 GB files and it still fails with the same
>> error as i originally reported.  So even though rsync could fit the entire
>> source, destination and diff file in RAM it still failed.
>>
>> Still waiting on the rdiff with tiny block size to complete :)
>>
>> Matt
>>
>>
>> On Tue, Mar 20, 2012 at 4:09 PM, Joachim Otahal (privat) <Jou at gmx.net>wrote:
>>
>>>  Matt Van Mater schrieb:
>>>
>>> Let me restate my last email regarding rdiff:
>>>
>>> All of my image files are from the same Windows XP VM, created using
>>> FOG/partimage.  Image1 is the "baseline", Image2 is Image1 + the WinSCP
>>> binary downloaded (not even installed).
>>>
>>>
>>> Use your virtualisation to create the difference / snapshot, and
>>> transfer what your virtualisation spits out in the diff disk image, and the
>>> let it merge on the target.
>>> How well that works depends on the bitchiness of your virtualisation *g*.
>>>
>>> Good luck!
>>>
>>> Joachim Otahal
>>>
>>>
>>>
>>> I am not imaging an Ubuntu machine.  I am using the Ubuntu machine as a
>>> means of creating the batch file for rsync and/or rdiff.  I chose that
>>> platform since it is a common distribution used by many and would be easy
>>> for others to reproduce my problem.
>>>
>>> I agree the 400 MB still looks big, but no the ONLY intentional
>>> difference between image1 and image2 is the 2.9 MB WinSCP binary i
>>> downloaded.  My guess is the difference is 1) due partially to the default
>>> block size rdiff uses (512b?) AND 2) the fact that the Windows XP VM image
>>> source only had 256 MB RAM and that by default Windows XP creates a
>>> pagefile of 1.5 x RAM size = 384 MB.  That is close enough to 400 MB for me.
>>>
>>> I am currently running rdiff with a smaller blocksize to test #1 above,
>>> hopefully that will force the delta to get smaller (at the expense of
>>> longer computation time).
>>>
>>> Matt
>>>
>>> On Tue, Mar 20, 2012 at 3:41 PM, Joachim Otahal (privat) <Jou at gmx.net>wrote:
>>>
>>>> Matt Van Mater schrieb:
>>>>
>>>>
>>>>> Alternate assessment - I ran a similar comparison against the two
>>>>> image files using rdiff that comes with Ubuntu 10.04.4 LTS (shown up as
>>>>> librsync 0.9.7) and have a significantly smaller delta file (closer to what
>>>>> i expect).
>>>>>
>>>>
>>>>  Just plain luck. If ubuntu wrote the most new files close to the last
>>>> used blocks and only changes a few bytes (this time literally) in the
>>>> middle then the desync happens later. The 400 MB delta still looks big, or
>>>> did you install something big like libreoffice?
>>>>
>>>> regards,
>>>>
>>>> Joachim Otahal
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20120321/fc65a988/attachment.html>


More information about the rsync mailing list