Batch mode creates huge diffs, bug(s)?
Matt Van Mater
matt.vanmater at gmail.com
Tue Mar 20 13:25:14 MDT 2012
Thanks for your response Eric but I disagree with your assessment and here
Functionally - I agree that Windows is bound to update multiple timestamps
on log files, registry entries, pagefile, etc every time it boots. However
I think it is unrealistic to assume literally half of the capacity used on
a WinXP install is due to logs. To put it a different way, the system was
only booted up for about 3 minutes (long enough to log in, download winscp
and shut it down). Assuming my VM wrote 7 gigs of data non-stop for that 3
minute period that's a maximum of 7000000000 bytes / 180s =~ 38.9 MB/s.
That is a _serious_ performance hit and would be very apparent in
performance monitoring tools... and there was no such hit.
Alternate assessment - I ran a similar comparison against the two image
files using rdiff that comes with Ubuntu 10.04.4 LTS (shown up as librsync
0.9.7) and have a significantly smaller delta file (closer to what i
1. rdiff signature image1 image1-signature
2. rdiff delta image1-signature image2 image1to2-delta
2. Resultant files:
1. image1-signature - 99,975,264 bytes
2. image1to2 delta - 446,384,815 bytes
So the two commands (rsync and rdiff) use the same or very similar
underlying libraries and result in a 426 MB delta file vs 6.9 GB delta
file! To me, that is a clear indication that either I am running the rsync
command incorrectly, or that there is a bug in rsync. Does that make sense?
On Tue, Mar 20, 2012 at 2:07 PM, <ericbambach1 at discover.com> wrote:
> Its probably not a rsync bug. Its likely that after booting to
> create the second image a large number of updates has happened at many
> different parts in the filesystem. You may have added only a few MB of
> data but a lot of little things are going on in an active system like
> filesystem timestamp updates, registry updates, etc. It could also have to
> do with the internal structure of the image. If it stores metadata about
> each part of the system the metadata could be different between runs
> causing a large number of differences.
> A 7GB diff of a 16GB file tells me about half the blocks were
> modified between runs which isn't completely unbelievable in an active,
> booted system.
> Eric Bambach | Discover
> Senior Assoc. Programmer, Warehouse Infrastructure and Tools
> 2500 Lake Cook Road, Riverwoods IL 60015
> P: 224.405.2896 ericbambach1 at discover.com
> From: Matt Van Mater <matt.vanmater at gmail.com>
> To: <rsync at lists.samba.org>
> Date: 03/20/2012 12:55 PM
> Subject: Batch mode creates huge diffs, bug(s)?
> Sent by: <rsync-bounces at lists.samba.org>
> So the short summary of my problem is, the batch file rsync creates is
> HUGE for a very small change. The idea is to create workstation image
> with partimage, update it with some software and send the image update
> diff over the wire to a large number of destinations over a satellite
> link, but the batch file updates are several orders of magnitude too
> large. I don't know exactly how partimage creates image files, so the
> bytes/blocks may be ordered differently between my two variants but should
> be identical, so rsync _should_ be able to handle that right?
> Software used: Ubuntu 9.10, fogproject.org v.28, partimage ??, rsync 3.0.6
> Hardware: Running as VM in ESXi 4.1 U2, 4 x vCPU and 16 GB RAM, 200 GB
> disk (150+ GB free)
> My testing process:
> 1. Use FOG .28 / partimage to capture an image of and already
> configured Windows XP workstation
> 2. Log in to workstation as normal user, download WinSCP (2.9 MB
> file), shut down machine gracefully
> 3. Use FOG .28/partimage / to capture the same system again, to a
> new image file.
> 4. FOG uses gzip to compress the partimage file, and we need to
> compare uncompressed images
> 1. Commands:
> 1. mv image1 image1.gz && mv image2 image2.gz && gunzip image1.gz &&
> gunzip image2.gz
> 2. Resultant files:
> 1. image1 size in bytes: 17,062,442,700
> 2. image2 size in bytes: 16,993,256,652
> 3. Difference in raw size in bytes: 69,186,048 (somewhat larger than
> the 2.9 MB difference I expect due to downloading WinSCP, but not the end
> of the world)
> 5. Create rsync diff package
> 1. Command:
> 1. rsync –only-write-batch=img1toimg2_diff image2 image1
> 2. Resultant files:
> 1. img1toimg2_diff size in bytes: 7,315,408,780
> 2. img1toimg2_diff.sh in bytes: 58
> 3. Difference is WAY bigger than raw file size. This HAS to be a bug!
> I thought perhaps specifying the block size might help (it does
> significantly in non-batch mode) but I get a error and cannot proceed. I
> have tried in both rsync v3.0.6 and v3.0.7 to specify the block size, but
> the result is the same:
> 1. Command:
> 1. rsync --block-size=512 –only-write-batch=img1toimg2_diff image2
> 2. Error message:
> 1. ERROR: Out of memory in receive_sums [sender]
> 2. rsync error: error allocating core memory buffers (code 22) at
> util.c(117) [sender=3.0.7]
> I looked at the changelog and haven't seen any updates to util.c since
> rsync v3.0.6 was released that might address this issue. So i think that
> I might be seeing two bugs: 1) huge diff size 2) crashing non-gracefully
> when trying to use block size with batch mode.
> Has anyone experienced this before, am I allowed to specify block size
> with batch mode? Any words of wisdom?
> Matt Van Mater--
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
> Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rsync