rsync -H option yields corrupt replicas (due to non-unique inode ids)
kmk at sanitarium.net
Fri Sep 6 01:45:58 CEST 2013
-----BEGIN PGP SIGNED MESSAGE-----
Now that I am awake I will ask the obvious question I should have
thought to ask this morning....
Can the NAS system talk rsync directly or rsync over ssh? If so
removing NFS from the equation will improve compatibility and performance.
On 09/05/13 19:43, Matthias Schniedermeyer wrote:
> On 05.09.2013 16:08, Andrew J. Romero wrote:
>> Our organization hosts a specialized Linux distribution.
>> As is typical with Linux distributions, the set of files that
>> make up our Linux distro contains a very complex web of
>> self-referential hard links.
>> Several other sites use our Linux distro and maintain either
>> partial or full internal mirror copies of it.
>> The standard method used by Linux mirror sites to pull/replicate
>> a subset of a Linux distribution (or a complete Linux
>> distribution) from a master repository is to use rsync with
>> options that produce the following behavior:
>> the first time a unique file is encountered, it's content is
>> replicated; however, when subsequent hard links to the file are
>> detected, only the hardlinks are replicated.
>> The primary copy of our Linux distro is stored on our BlueArc
>> Titan NAS (NFS server). Relative to the mirror-sites, our rsync
>> server "sits in front of" the NAS.
>> Internally the BlueArc Titan has a unique object id for files;
>> however, the inode ID presented to clients by the BlueArc Titan
>> is not unique, rsync (with -H option) is erroneously identifying
>> unique files as a hard-links to different files. Causing mirror
>> repositories to be essentially corrupt and not usable.
>> It is my understanding that the NFS v3 spec. does not require NFS
>> servers to present unique inode ids to clients. I believe that
>> the reasoning is that: large scale NAS appliances internally need
>> to use very wide object ids; but, externally need to present
>> (when asked) inode ids that any client an deal with.
>> Are there options to rsync that will allow me to reliably
>> replicate my hard-link rich Linux distro from my NAS.
> I could be a plain 32bit/64bit problem.
> In this case 64bit inodes and i'm not sure NFS v3 supports 64bit
> inodes. I'm pretty sure that NFS v4 supports 64bit inodes and NFS
> v2 doesn't. Google didn't give me a straight answer and the
> Wikipedia-Page only says that NFS v3 got support for 64bit
> file-size/offsets, but inodes aren't mentioned.
> So assuming NFS v3 either doesn't support 64bit inodes or somehow
> isn't setup correctly: Just as Kevin said rsync determines "is the
> same file" by inode, so if the filesystem has 64bit inodes and NFS
> truncates them to 32bit totally unreleated files APPEAR to have the
> same inode. So if rsync doesn't check size/mtime/owner(...) it can
> crosslink totally unrelated files.
> As you should have examples of "crosslinked" files just "stat" them
> on the commandline and see what same inode-numbers are shown. And
> on the NAS itself, assuming you can get a command prompt, also stat
> the file and check if the inode-numbers are below or above 2^32.
> And assuming you get different numbers below or above 2^32 check if
> the lower 32bits are identical.
> And if you ask yourself "hey 32bit is a large number space, how can
> i get collisions". That's called the birthday paradox:
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Kevin at FutureQuest.net (work)
Orlando, Florida kmk at sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
More information about the rsync