DO NOT REPLY [Bug 6816] New: Delta-transfer algorithm does not reuse already transmitted identical blocks
samba-bugs at samba.org
samba-bugs at samba.org
Thu Oct 15 09:24:47 MDT 2009
https://bugzilla.samba.org/show_bug.cgi?id=6816
Summary: Delta-transfer algorithm does not reuse already
transmitted identical blocks
Product: rsync
Version: 3.0.5
Platform: Other
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P3
Component: core
AssignedTo: wayned at samba.org
ReportedBy: martin at scharrer-online.de
QAContact: rsync-qa at samba.org
Hi,
I observed the following behavior of rsync: If a file contains identical blocks
(e.g. all-zero, etc.) then these blocks are not re-transfered but reused by the
delta-transfer algorithm - BUT only if one of these blocks is already in the
destination file. If not or if the destination file does not exists yet, all
identical blocks are copied over and over again. In some special cases (e.g.
large sparse files which are rsync'ed --inplace, i.e. -S can't be used) it is
much better to interrupt the rsync operations after a while and restart it so
that the identical blocks are reused, not re-transfered.
A good (but kind of trivial) example whould be a big file (say 1GB) only
containing zeros (dd if=/dev/zero of=file bs=1M count=1k) which is transfered
without the -S option. If the file does not exists at the destination it is
copied as a whole like e.g. 'scp' whould do it. I my case it is copied with
about 2MB/s. But if the file already exists, even which only a very small size,
the identical blocks are reused and the "transfer speed" is around the
destination hard drive I/O speed (in my case 60-120MB/s, target is a tmpfs
ramdisk).
I also tested this with a file with pseudo-random, but repeating content (dd
if=/dev/urandom of=temp bs=1M count=10; cat temp temp ... temp > file). If the
first rsync process is aborted and restarted after the first repeating block
was transfered the second rsync process is only sending meta-data, because the
existing content is just replicated.
It would be great if the delta-transfer algorithm would be extended to account
for identical to-be-send data blocks, i.e. first send the first appearance of
such a block and then simply reuse it during the same rsync process. IMHO this
should not be so difficult to implement, because most needed functionality is
already there.
--
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
More information about the rsync
mailing list