DO NOT REPLY [Bug 6816] New: Delta-transfer algorithm does not reuse already transmitted identical blocks

samba-bugs at samba.org samba-bugs at samba.org
Thu Oct 15 09:24:47 MDT 2009


https://bugzilla.samba.org/show_bug.cgi?id=6816

           Summary: Delta-transfer algorithm does not reuse already
                    transmitted identical blocks
           Product: rsync
           Version: 3.0.5
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: martin at scharrer-online.de
         QAContact: rsync-qa at samba.org


Hi,

I observed the following behavior of rsync: If a file contains identical blocks
(e.g. all-zero, etc.) then these blocks are not re-transfered but reused by the
delta-transfer algorithm - BUT only if one of these blocks is already in the
destination file. If not or if the destination file does not exists yet, all
identical blocks are copied over and over again. In some special cases (e.g.
large sparse files which are rsync'ed --inplace, i.e. -S can't be used) it is
much better to interrupt the rsync operations after a while and restart it so
that the identical blocks are reused, not re-transfered.

A good (but kind of trivial) example whould be a big file (say 1GB) only
containing zeros (dd if=/dev/zero of=file bs=1M count=1k) which is transfered
without the -S option. If the file does not exists at the destination it is
copied as a whole like e.g. 'scp' whould do it. I my case it is copied with
about 2MB/s. But if the file already exists, even which only a very small size,
the identical blocks are reused and the "transfer speed" is around the
destination hard drive I/O speed (in my case 60-120MB/s, target is a tmpfs
ramdisk).
I also tested this with a file with pseudo-random, but repeating content (dd
if=/dev/urandom of=temp bs=1M count=10; cat temp temp ... temp > file). If the
first rsync process is aborted and restarted after the first repeating block
was transfered the second rsync process is only sending meta-data, because the
existing content is just replicated.

It would be great if the delta-transfer algorithm would be extended to account
for identical to-be-send data blocks, i.e. first send the first appearance of
such a block and then simply reuse it during the same rsync process. IMHO this
should not be so difficult to implement, because most needed functionality is
already there.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.


More information about the rsync mailing list