Batch mode creates huge diffs, bug(s)?

Matt Van Mater matt.vanmater at gmail.com
Tue Mar 20 13:25:14 MDT 2012


Thanks for your response Eric but I disagree with your assessment and here
is why:

Functionally - I agree that Windows is bound to update multiple timestamps
on log files, registry entries, pagefile, etc every time it boots.  However
I think it is unrealistic to assume literally half of the capacity used on
a WinXP install is due to logs.  To put it a different way, the system was
only booted up for about 3 minutes (long enough to log in, download winscp
and shut it down).  Assuming my VM wrote 7 gigs of data non-stop for that 3
minute period that's a maximum of 7000000000 bytes / 180s =~ 38.9 MB/s.
That is a _serious_ performance hit and would be very apparent in
performance monitoring tools... and there was no such hit.

Alternate assessment - I ran a similar comparison against the two image
files using rdiff that comes with Ubuntu 10.04.4 LTS (shown up as librsync
0.9.7) and have a significantly smaller delta file (closer to what i
expect).

   1. Commands:
      1. rdiff signature image1 image1-signature
      2. rdiff delta image1-signature image2 image1to2-delta
   2. Resultant files:
      1. image1-signature - 99,975,264 bytes
      2. image1to2 delta - 446,384,815 bytes


So the two commands (rsync and rdiff) use the same or very similar
underlying libraries and result in a 426 MB delta file vs 6.9 GB delta
file!  To me, that is a clear indication that either I am running the rsync
command incorrectly, or that there is a bug in rsync.  Does that make sense?

Matt

On Tue, Mar 20, 2012 at 2:07 PM, <ericbambach1 at discover.com> wrote:

> Matt,
>
>        Its probably not a rsync bug. Its likely that after booting to
> create the second image a large number of updates has happened at many
> different parts in the filesystem. You may have added only a few MB of
> data but a lot of little things are going on in an active system like
> filesystem timestamp updates, registry updates, etc. It could also have to
> do with the internal structure of the image. If it stores metadata about
> each part of the system the metadata could be different between runs
> causing a large number of differences.
>
>        A 7GB diff of a 16GB file tells me about half the blocks were
> modified between runs which isn't completely unbelievable in an active,
> booted system.
>
> Eric Bambach | Discover
> Senior Assoc. Programmer, Warehouse Infrastructure and Tools
> 2500 Lake Cook Road, Riverwoods IL 60015
> P: 224.405.2896 ericbambach1 at discover.com
>
>
>
>
> From:   Matt Van Mater <matt.vanmater at gmail.com>
> To:     <rsync at lists.samba.org>
> Date:   03/20/2012 12:55 PM
> Subject:        Batch mode creates huge diffs, bug(s)?
> Sent by:        <rsync-bounces at lists.samba.org>
>
>
>
> So the short summary of my problem is, the batch file rsync creates is
> HUGE for a very small change.  The idea is to create workstation image
> with partimage, update it with some software and send the image update
> diff over the wire to a large number of destinations over a satellite
> link, but the batch file updates are several orders of magnitude too
> large.  I don't know exactly how partimage creates image files, so the
> bytes/blocks may be ordered differently between my two variants but should
> be identical, so rsync _should_ be able to handle that right?
>
> Software used: Ubuntu 9.10, fogproject.org v.28, partimage ??, rsync 3.0.6
> Hardware: Running as VM in ESXi 4.1 U2, 4 x vCPU and 16 GB RAM, 200 GB
> disk (150+ GB free)
>
> My testing process:
> 1.      Use FOG .28 / partimage to capture an image of and already
> configured Windows XP workstation
> 2.      Log in to workstation as normal user, download WinSCP (2.9 MB
> file), shut down machine gracefully
> 3.      Use FOG .28/partimage /  to capture the same system again, to a
> new image file.
> 4.      FOG uses gzip to compress the partimage file, and we need to
> compare uncompressed images
> 1.      Commands:
> 1.      mv image1 image1.gz && mv image2 image2.gz && gunzip image1.gz &&
> gunzip image2.gz
> 2.      Resultant files:
> 1.      image1 size in bytes: 17,062,442,700
> 2.      image2 size in bytes: 16,993,256,652
> 3.      Difference in raw size in bytes: 69,186,048 (somewhat larger than
> the 2.9 MB difference I expect due to downloading WinSCP, but not the end
> of the world)
> 5.      Create rsync diff package
> 1.      Command:
> 1.      rsync –only-write-batch=img1toimg2_diff image2 image1
> 2.      Resultant files:
> 1.      img1toimg2_diff size in bytes: 7,315,408,780
> 2.      img1toimg2_diff.sh in bytes: 58
> 3.      Difference is WAY bigger than raw file size. This HAS to be a bug!
> I thought perhaps specifying the block size might help (it does
> significantly in non-batch mode) but I get a error and cannot proceed.  I
> have tried in both rsync v3.0.6 and v3.0.7 to specify the block size, but
> the result is the same:
> 1.      Command:
> 1.      rsync --block-size=512 –only-write-batch=img1toimg2_diff image2
> image1
> 2.      Error message:
> 1.      ERROR: Out of memory in receive_sums [sender]
> 2.      rsync error: error allocating core memory buffers (code 22) at
> util.c(117) [sender=3.0.7]
>
> I looked at the changelog and haven't seen any updates to util.c since
> rsync v3.0.6 was released that might address this issue.  So i think that
> I might be seeing two bugs: 1) huge diff size 2) crashing non-gracefully
> when trying to use block size with batch mode.
>
> Has anyone experienced this before, am I allowed to specify block size
> with batch mode?  Any words of wisdom?
>
> Thanks,
> Matt Van Mater--
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
>
> Please consider the environment before printing this email.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20120320/212805b5/attachment.html>


More information about the rsync mailing list