[Bug 1489] New: Corrupt transfer with the fuzzy option
samba-bugs at samba.org
samba-bugs at samba.org
Tue Jun 29 13:41:54 GMT 2004
https://bugzilla.samba.org/show_bug.cgi?id=1489
Summary: Corrupt transfer with the fuzzy option
Product: rsync
Version: 2.6.2
Platform: All
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P3
Component: core
AssignedTo: wayned at samba.org
ReportedBy: egmont at uhulinux.hu
QAContact: rsync-qa at samba.org
I've been using the rusty-fuzzy patch of rsync for a long time without problems,
but now I've found a special circumstance where this option of the patched
rsync 2.6.2 leads to data corruption.
How to reproduce:
Create a /tmp/foo1 directory with four files.
foobar-1.0.txt contains one megabyte of '1' and then one megabyte of '0' chars.
A possible way to create this file:
{ dd if=/dev/zero bs=1k count=1k | tr '\0' '1'; dd if=/dev/zero bs=1k count=1k |
tr '\0' '0'; } > /tmp/foo1/foobar-1.0.txt
Then create foobar-2.0.txt which has a megabyte of '2' and then a meg of '0'.
Similarly create foobar-2.1.txt which has lots of '2' and then lots of '1'
digits. (So the contents of these three files reflect their filename.)
The md5sums are:
f6535bdc24b1074a704ef0166f93b4f0 foobar-1.0.txt
a1574b29877c570cb44436f7a404aa71 foobar-2.0.txt
80c3df2d255f201f842a89ebcb3f078c foobar-2.1.txt
And let's create a zzz.txt with any content, it doesn't influence anything,
only makes the example better.
Create /tmp/foo2, cp -a /tmp/foo1/foobar-1.0.txt /tmp/foo2/ but other files
should not yet be copied.
Here we go (bwlimit is not important):
$ rsync -a --fuzzy --bwlimit=100 localhost:/tmp/foo1/ /tmp/foo2/
receiving file list ... done
./
foobar-2.0.txt
foobar-2.1.txt
zzz.txt
foobar-2.1.txt
wrote 46464 bytes read 3155804 bytes 98531.32 bytes/sec
total size is 6291460 speedup is 1.96
As the 'screenshot' shows foobar-2.1.txt is transferred two times. If rsync
is not interrupted then at the end foobar-2.1.txt is okay.
However, after the first transfer its content is invalid (2096928 '2' followed
by 224 '1' chars, md5sum is 795cb4c484c711d902acd3011e70832e). The file size
and the timestamp are correct.
Hence if either rsync is interrupted or someone else mirrors us (using rsync
without -c) during this process, the result will be a file with incorrect
content but correct metadata and so further rsyncing will not repair it.
First note: it seems based on this example that rsync has some self-protecting
mechanism (a stupid program would most likely not even notice that the transfer
was incorrect and wouldn't restart it). However, this way this self-protecting
mechanism isn't really perfect. It should check whether the whole file is okay
before renaming it to its final name so that an interrupt cannot leave a corrupt
file on the disk. If it is not trivial to solve due to some technical
difficulties, then at least the time stamp should be set to some fake value to
force a recheck of this file if rsync gets interrupted.
Second note: I think rsync --fuzzy starts to misbehave when the closest filename
changes during the operation. In my example, initially foobar-1.0.txt was the
closest filename to foobar-2.1.txt, however, during the operation,
foobar-2.0.txt has appeared which is even closer to foobar-2.1.txt. Somehow
I guess rsync cannot clearly decide which one of these two files to use as the
reference, and this might be the cause of the problem.
--
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
More information about the rsync
mailing list