DO NOT REPLY [Bug 4128] New: ignore-times with link-dest behaves unexpected / sematics not clear

samba-bugs at samba.org samba-bugs at samba.org
Thu Sep 28 00:49:04 GMT 2006


https://bugzilla.samba.org/show_bug.cgi?id=4128

           Summary: ignore-times with link-dest behaves unexpected /
                    sematics not clear
           Product: rsync
           Version: 2.6.8
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: jwagner at computing.dcu.ie
         QAContact: rsync-qa at samba.org


Hi,

I checked the following with rsync 2.6.8 from Fedora Core 5 (updated this week)
and the current 2.6.9cvs. The behaviour is different, but still not as
expected. The man page says that --ignore-times switches off any quick checks.
Therefore, I concluded that this option makes sure that target data is correct
in any case. Elsewhere, I even read that --ignore-times is an alternative for
--checksum and that one or the other can be prefered depending on how many
files are expected to be the same. However, when used together with
--link-dest, the following happens.

In 2.6.8, --ignore-times with --link-dest doesn't identify at all that files on
the receiver can be hard-linked. This problem is known to the developers as
revision r 1.273 of generator.c attempts to fix it. At least, rsync 2.6.8 uses
the files in the link-dest directory to reduce network traffic, basically
copying the while whithin the receiver.

With 2.6.9cvs (2006-09-27, generator.c revision r 1.285), the same options
cause the files to be hard-linked without being verified to have identical
content. (Note: I installed rsync locally on both machines (configure
--prefix=$HOME) and used option --rsync-path=$HOME/bin/rsync, see below.)

Test script: Two machines A and B, Same user + numeric IDs.
(I used Pentium 4 PCs with Fedora C5, updated with default repositories). 

export B=192.168.0.20   # <-- set this to the 2nd machine to be able to copy
and paste from here (you might also want to configure ssh to avoid typing login
passwords again and again)
# step 1 - prepare data
echo "one" >test1.txt
echo "two" >test2.txt
mkdir -p ref/data
cp test1.txt ref/data/text101.txt
cp test1.txt ref/data/text102.txt
touch -d 060927 ref/data/*
mkdir data
cp test1.txt data/text101.txt
cp test2.txt data/text102.txt
touch -d 060927 data/*
mkdir dst
rsync -av ref dst `whoami`@$B:./
# step 2 - test rsync
rsync -av --ignore-times --link-dest=../ref/ data `whoami`@$B:dst/
# note: dest=ref/ would be relative to dst/
# note2: if you had to type in a password for the first rsync,
# copying'n'pasting the 2nd rsync might not have worked in one go
# step 3 - analyze results
ssh `whoami`@$B 'ls -li dst/data/ ; cat dst/data/*'
# note: 3rd column gives the hard-link count

# cleaning up for next run
ssh `whoami`@$B 'rm -f dst/data/*'
# test newest version
$HOME/bin/rsync -av --rsync-path=$HOME/bin/rsync --ignore-times \
--link-dest=../ref/ data `whoami`@$B:dst/
ssh `whoami`@$B 'ls -li dst/data/ ; cat dst/data/*'

Of course, it can be argued that the conclusion is wrong and the long
description in the manpage missleading. --ignore-times simply does what is
says: it ignores time stamps. However, the consequences when used with other
options should be reasonable, or at least be documented.

Motivation of combining these options: Machine B is a mirrow of machine A.
Unfortunately, machine A turned out to have had a hardware defect that causes
sporadic read errors. Files on B are likely to be damaged. Files on A might
also be permanently damaged. For further analysis, I'd like to have all files
on B. Without --link-dest, I don't have enough space. Without --ignore-times,
files with same stat but with a bit error somewhere in the middle will not be
detected.

I'll now reconsider using --checksum although it seems to waste lots of time by
calculating checksums sequentially first on machine A while B is idle, then,
presumably (didn't get this far as I got impatient after 6 hours of high CPU
and disk I/O load on A) on machine B while A is idle, to eventually apply the
normal rsync algorithm on those files that are not identical. But this is a
different story.

Regards,
Joachim


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.


More information about the rsync mailing list