Skipping hardlinks in a copy

Sri Ramkrishna sramkris at ichips.intel.com
Thu Mar 8 20:43:31 GMT 2007


On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:

Hi there,

For some reason, I sent this mail before I was fully subscribed and I
have missed out on the replies.  If I don't answer all the responses this
is why.


> The following command pipeline can give you a list which you could
> isolate to being just the first ocurrence of each file that is sharing the 
> same inode:

> find . ! -type d -printf '%10i %P\n' | awk
> '{n=substr($0,12);if(a[$1]==1){print 
> "other",n;}else{a[$1]=1;print "first",n;}}'

Yes, I think I have something similar that someone else has used to do
the same thing.  Thank you, this is most useful.

> One approach in the situation you have, if the filesystem is not corrupt
> (which it might be, because files don't create cycles), is to create a

I think I probably hard links to directories.  I have observed cpio
going through a loop continously.  Since I was doing this on an AIX
JFS filesystem (on an AIX fileserver) it might not have same protections
that I believe Linux when hitting a circular loop.

> list of files based on their inode number, and hardlink each file to one
> named by its inode number.  Just rsync the directory full of inode
> numbers.  Then re-expand on the destination based on that list.

> You should not be following symlinks in a file tree recursion.  Rsync,
> find, cpio, and others, know not to.

> But I suspect some kind of filesystem corruption, or at least some hard
> links being applied to directories.  The latter can create cycles if not
> done carefully (and there is virtually no case to ever do that at all by
> intent).

I think this is exactly what's happening.  I think I have a number of
cycles that are causing the data to go loopy. (pardon the pun)  If
that's the case, how does one find self referential hard/softlinks?

> I do not consider it bad organization to have lots of files be
> hardlinked.  In fact, I have a program that actually seeks out
> indentical files and makes them be hardlinked to save space (not
> safe in all cases, but safe in most).

Sure, but in a large filesystem, it's been very painful to copy this
data when rsync is taking days instead of hours.

> The command "find . -type l" will only find symlinks.  You can find
> files that have hard links with "find . ! -type d -links +1 -print".  
> Note that all file types can have hard links, even symlinks.  Do 
> exclude directories as those will have many links for other reasons 
> (e.g. 1 for self reference, 1 for being inside a directory and 1 each 
> for each subdirectory within).

Can I also do use find to create a list of files that are not hardlink
and then use --include-file and --exclude=*?  I had thought that might
be an alternative way.  If I use this rule, does rsync still stat
through the filesystem?

sri


More information about the rsync mailing list