Serious issue: rsync and hardlinks are dangerous...

Jesus Cea jcea at jcea.es
Thu Aug 9 19:08:25 MDT 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rsync 3.0.9 here.

I am using a rsync script like:

"""
rsync -z --numeric-ids -a -H --inplace --delete --delete-excluded
- --stats --progress -v --itemize-changes SOURCE DESTINATION
"""

I detected the following issue when RSYNCing a bunch of Mercurial
repositories. It is very dangerous, because it will corrupt files.

When cloning a local repository, Mercurial uses hardlinks for
performance and disk use. When one of the clones updates a file, the
file is "unlinked" and replaced by a new file, so history can diverge
gracefully.

The problem can be trivially reproduced like this:

1. Create a text file "a.txt" with a bunch of caracters inside.

2. Create a hardlink to that file, called "b.txt".

3. Use "rsync -z --numeric-ids -a -H --inplace --delete
- --delete-excluded --stats --progress -v --itemize-changes SOURCE
DESTINATION" to replicate the directory.

4. Verify that a new directory is created, with two files "a.txt" and
"b.txt", hardlinked. Nice.

5. Now delete the original "b.txt" and create a new file "b.txt", with
new DIFFERENT content. So you now have two different files in the source.

6. Rerun the "rsync" script.

7. In the destination directory you will have two files, "a.txt" and
"b.txt". They are still the same file, hardlinked. Both will have the
same content. The content of the original "a.txt" *OR* "b.txt" file.

8. Rerun the "rsync" script a few times. Each time, the destination
will have two hardlinked files, with the same content, alternating
between the "a.txt" and "b.txt" files.

So origin and destination will never synchronize (each time you rsync,
destination will alternate content), and destination will be
"corrupt", since different files in the origin are the same file in
the destination. Two years of backups are spoiled, because of this :-(.

I know that source and destination files can have a different link
count for a variety of valid reasons, but rsync should know, when
using "-H", that two hardlinked files in the destination are not
hardlinked in the origin anymore. That should be quite easy to detect,
since rsync track inodes already (when using "-H"), and can detect
that two files inside the destination path hardlinked are not
hardlinked in the origin.

Even if I stop using "-H", that I rather not, the destination will be
permanently corrupted UNTIL we delete it and start over again.

In my particular case, not using "-H" will explode my disk usage, but
using "-H" will corrupt the destination.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBUCRfCZlgi5GaxT1NAQIYzgP8DgYS+9RKwoR57KjcX+jAyhQmizZ3UG1y
3mSJmz0a77NiCiRhXDbaxEqBbmdNk6pZDWjva2CVKITjUqbIaPyR87NtD1kNd24q
LNWpTkS7KXEM7DzNs93URllT4jrnfx5W98EORXC7D6A8lg62WBipX4b91Xlx+/yj
63X7F4I7hIc=
=G19k
-----END PGP SIGNATURE-----


More information about the rsync mailing list