rsync -H option yields corrupt replicas (due to non-unique inode ids)
Andrew J. Romero
romero at fnal.gov
Thu Sep 5 18:08:59 CEST 2013
Our organization hosts a specialized Linux distribution.
As is typical with Linux distributions,
the set of files that make up our Linux distro
contains a very complex web of self-referential hard links.
Several other sites use our Linux distro
and maintain either partial or full
internal mirror copies of it.
The standard method used by Linux mirror sites to
pull/replicate a subset of a Linux distribution
(or a complete Linux distribution) from a master
repository is to use rsync with options that
produce the following behavior:
the first time a unique file is encountered,
it's content is replicated; however, when subsequent hard links
to the file are detected, only the hardlinks are replicated.
The primary copy of our Linux distro
is stored on our BlueArc Titan NAS
(NFS server). Relative to the mirror-sites,
our rsync server "sits in front of" the NAS.
Internally the BlueArc Titan has a unique object id
for files; however, the inode ID presented to clients
by the BlueArc Titan is not unique,
rsync (with -H option) is erroneously
identifying unique files
as a hard-links to different files.
Causing mirror repositories to be essentially corrupt
and not usable.
It is my understanding that the NFS v3 spec.
does not require NFS servers to present unique inode
ids to clients. I believe that the reasoning is that:
large scale NAS appliances internally need to
use very wide object ids; but, externally need to
present (when asked) inode ids that any client
an deal with.
Are there options to rsync that will
allow me to reliably replicate my
hard-link rich Linux distro from my NAS.
More information about the rsync