rsync mirroring and hardlink issues

Alexander 'Leo' Bergolth leo at strike.wu-wien.ac.at
Thu Apr 26 21:55:32 GMT 2007


I'm running a mirror of several repositories that are fetched using
separate rsync runs. Since some of those repositories are hosting
related files, I'm using the hardlink utility[1] in order to save disk
space.

However, I've noticed an issue that may lead to potential file metadata
inconsistencies when using hardlink.

Consider the following scenario:

- two repositories (rep_a and rep_b) are mirrored that initially have a
file in common.
- after rsync mirrored the files, there are two identical copies in my
mirror
- hardlink detects those copies and links them together
- the file attributes (permissions, ownership, mtime, etc.) of a
hardlinked file in rep_a change upstream
- rsync detects this change, updates the destination file in-place (it
doesn't break the hardlinks) which leads to an inconsistant view of the
file attributes in rep_b
- depending on the order of the mirroring commands, the second rsync job
for rep_b might reverse the attribute change (again for both repositories)

Recapitulating, once two files are hardlinked, rsync will break the
hardlink only if one files _data_ changes, if only the metadata (mode,
ownership, times) changes, the file will be updated in-place, leading to
an inconsistent mirror.

Unfortunately I couldn't find an option for rsync to apply even
metadata-changes to a new copy of the file. (Another option could be
checking the link-count of the inode and create a new copy of the file
only if it is greater than one.)

Is there any workaround for this issue?
Thanks in advance,
--leo

[1] http://code.google.com/p/hardlinkpy/

-- 
e-mail   ::: Alexander.Bergolth (at) wu-wien.ac.at
fax      ::: +43-1-31336-906050
location ::: Computer Center | Vienna University of Economics | Austria



More information about the rsync mailing list