Skipping hardlinks in a copy

Phil Howard phil-rsync-2 at ipal.net
Thu Mar 8 12:55:27 GMT 2007


On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:

| Hi folks, I've been googling around for awhile but I can't seem to find
| an answer to my question. 
| 
| I have a number of filesystems that contain thousands of hard links due
| to some bad organization of data.  Rsync, cpio and various other
| utilities fail to copy this data because I think there might be some
| cycles in it.  (you know you have troubles if cpio can't copy it!)
| 
| What I thought I would do instead is to copy the data but skip any files
| that are hard links.  Then after the copy is finished, I will use some
| kind of find . -type l type command that finds the hard links and then
| make a script to recreate it.  This saves me a lot of trouble with not
| having to stat the files and not having the receive side balloon up.
| 
| Is there a way to have it skip hard links when doing an rsync?
| Or is there some other mystic incantation that I can use that might
| accomplish the same thing.

The following command pipeline can give you a list which you could isolate
to being just the first ocurrence of each file that is sharing the same
inode:

find . ! -type d -printf '%10i %P\n' | awk '{n=substr($0,12);if(a[$1]==1){print "other",n;}else{a[$1]=1;print "first",n;}}'

Note the above is 123 characters long.  You may have issues with mail
programs that truncate or wrap it around, so be careful.  The fixed
size formatting of the inode number in the find output is to make it
easy to extract the name, or the name plust the symlink target, in the
awk command using substr().

One approach in the situation you have, if the filesystem is not corrupt
(which it might be, because files don't create cycles), is to create a
list of files based on their inode number, and hardlink each file to one
named by its inode number.  Just rsync the directory full of inode numbers.
Then re-expand on the destination based on that list.

You should not be following symlinks in a file tree recursion.  Rsync,
find, cpio, and others, know not to.

But I suspect some kind of filesystem corruption, or at least some hard
links being applied to directories.  The latter can create cycles if not
done carefully (and there is virtually no case to ever do that at all by
intent).

I do not consider it bad organization to have lots of files be hardlinked.
In fact, I have a program that actually seeks out indentical files and makes
them be hardlinked to save space (not safe in all cases, but safe in most).

The command "find . -type l" will only find symlinks.  You can find files
that have hard links with "find . ! -type d -links +1 -print".  Note that
all file types can have hard links, even symlinks.  Do exclude directories
as those will have many links for other reasons (e.g. 1 for self reference,
1 for being inside a directory and 1 each for each subdirectory within).

-- 
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org)  /  Do not send to the address below |
| first name lower case at ipal.net   /  spamtrap-2007-03-08-0651 at ipal.net |
|------------------------------------/-------------------------------------|


More information about the rsync mailing list