rsync -H option yields corrupt replicas (due to non-unique inode ids)

Matthias Schniedermeyer ms at citd.de
Fri Sep 6 01:43:26 CEST 2013


On 05.09.2013 16:08, Andrew J. Romero wrote:
> Hi,
> 
> Our organization hosts a specialized Linux distribution.
> 
> As is typical with Linux distributions, 
> the set of files that make up our Linux distro 
> contains a very complex web of self-referential hard links.
> 
> Several other sites use our  Linux distro
> and maintain either partial or full
> internal mirror copies of it.
> 
> The standard method used by Linux mirror sites to
> pull/replicate a subset of a Linux distribution
> (or a complete Linux distribution) from a master
> repository is to use rsync with options that
> produce the following behavior:
> 
>   the first time a unique file is encountered, 
>   it's content is replicated; however,  when subsequent hard links 
>   to the file are detected, only the hardlinks are replicated.
> 
> The primary copy of our Linux distro
> is stored on our BlueArc Titan NAS
> (NFS server). Relative to the mirror-sites, 
> our rsync server "sits in front of" the NAS.
> 
> Internally the BlueArc Titan has a unique object id
> for files; however, the inode ID presented to clients 
> by the BlueArc Titan is not unique, 
> rsync (with -H option) is erroneously 
> identifying unique files
> as a hard-links to different files.
> Causing mirror repositories to be essentially corrupt
> and not usable.
> 
> It is my understanding that the NFS v3 spec.
> does not require NFS servers to present unique inode
> ids to clients. I believe that the reasoning is that:
> large scale NAS appliances internally need to
> use very wide object ids; but, externally need to
> present (when asked) inode ids that any client
> an deal with.
> 
> Are there options to rsync that will
> allow me to reliably replicate my 
> hard-link rich Linux distro from my NAS.

I could be a plain 32bit/64bit problem.

In this case 64bit inodes and i'm not sure NFS v3 supports 64bit inodes. 
I'm pretty sure that NFS v4 supports 64bit inodes and NFS v2 doesn't.
Google didn't give me a straight answer and the Wikipedia-Page only 
says that NFS v3 got support for 64bit file-size/offsets, but inodes 
aren't mentioned.

So assuming NFS v3 either doesn't support 64bit inodes or somehow isn't 
setup correctly:
Just as Kevin said rsync determines "is the same file" by inode, so if 
the filesystem has 64bit inodes and NFS truncates them to 32bit totally 
unreleated files APPEAR to have the same inode. So if rsync doesn't 
check size/mtime/owner(...) it can crosslink totally unrelated files.

As you should have examples of "crosslinked" files just "stat" them on 
the commandline and see what same inode-numbers are shown. And on the 
NAS itself, assuming you can get a command prompt, also stat the file 
and check if the inode-numbers are below or above 2^32. And assuming you 
get different numbers below or above 2^32 check if the lower 32bits 
are identical.


And if you ask yourself "hey 32bit is a large number space, how can i 
get collisions". That's called the birthday paradox:
http://en.wikipedia.org/wiki/Birthday_paradox




-- 

Matthias


More information about the rsync mailing list