rsync -H option yields corrupt replicas (due to non-unique inode ids)

Andrew J. Romero romero at
Thu Sep 5 18:08:59 CEST 2013


Our organization hosts a specialized Linux distribution.

As is typical with Linux distributions, 
the set of files that make up our Linux distro 
contains a very complex web of self-referential hard links.

Several other sites use our  Linux distro
and maintain either partial or full
internal mirror copies of it.

The standard method used by Linux mirror sites to
pull/replicate a subset of a Linux distribution
(or a complete Linux distribution) from a master
repository is to use rsync with options that
produce the following behavior:

  the first time a unique file is encountered, 
  it's content is replicated; however,  when subsequent hard links 
  to the file are detected, only the hardlinks are replicated.

The primary copy of our Linux distro
is stored on our BlueArc Titan NAS
(NFS server). Relative to the mirror-sites, 
our rsync server "sits in front of" the NAS.

Internally the BlueArc Titan has a unique object id
for files; however, the inode ID presented to clients 
by the BlueArc Titan is not unique, 
rsync (with -H option) is erroneously 
identifying unique files
as a hard-links to different files.
Causing mirror repositories to be essentially corrupt
and not usable.

It is my understanding that the NFS v3 spec.
does not require NFS servers to present unique inode
ids to clients. I believe that the reasoning is that:
large scale NAS appliances internally need to
use very wide object ids; but, externally need to
present (when asked) inode ids that any client
an deal with.

Are there options to rsync that will
allow me to reliably replicate my 
hard-link rich Linux distro from my NAS.



More information about the rsync mailing list