Serious issue: rsync and hardlinks are dangerous...

Jesus Cea jcea at
Thu Aug 9 19:08:25 MDT 2012

Hash: SHA1

Rsync 3.0.9 here.

I am using a rsync script like:

rsync -z --numeric-ids -a -H --inplace --delete --delete-excluded
- --stats --progress -v --itemize-changes SOURCE DESTINATION

I detected the following issue when RSYNCing a bunch of Mercurial
repositories. It is very dangerous, because it will corrupt files.

When cloning a local repository, Mercurial uses hardlinks for
performance and disk use. When one of the clones updates a file, the
file is "unlinked" and replaced by a new file, so history can diverge

The problem can be trivially reproduced like this:

1. Create a text file "a.txt" with a bunch of caracters inside.

2. Create a hardlink to that file, called "b.txt".

3. Use "rsync -z --numeric-ids -a -H --inplace --delete
- --delete-excluded --stats --progress -v --itemize-changes SOURCE
DESTINATION" to replicate the directory.

4. Verify that a new directory is created, with two files "a.txt" and
"b.txt", hardlinked. Nice.

5. Now delete the original "b.txt" and create a new file "b.txt", with
new DIFFERENT content. So you now have two different files in the source.

6. Rerun the "rsync" script.

7. In the destination directory you will have two files, "a.txt" and
"b.txt". They are still the same file, hardlinked. Both will have the
same content. The content of the original "a.txt" *OR* "b.txt" file.

8. Rerun the "rsync" script a few times. Each time, the destination
will have two hardlinked files, with the same content, alternating
between the "a.txt" and "b.txt" files.

So origin and destination will never synchronize (each time you rsync,
destination will alternate content), and destination will be
"corrupt", since different files in the origin are the same file in
the destination. Two years of backups are spoiled, because of this :-(.

I know that source and destination files can have a different link
count for a variety of valid reasons, but rsync should know, when
using "-H", that two hardlinked files in the destination are not
hardlinked in the origin anymore. That should be quite easy to detect,
since rsync track inodes already (when using "-H"), and can detect
that two files inside the destination path hardlinked are not
hardlinked in the origin.

Even if I stop using "-H", that I rather not, the destination will be
permanently corrupted UNTIL we delete it and start over again.

In my particular case, not using "-H" will explode my disk usage, but
using "-H" will corrupt the destination.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at -     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


More information about the rsync mailing list