rsync --link-dest, --delete and hard-link count

Benjamin R. Haskell rsync at benizi.com
Sat Feb 6 15:46:49 MST 2010


On Sat, 6 Feb 2010, Tony Abernethy wrote:

> Grarpamp wrote:
> > Yes, hardlinks save data block duplication... yet on filesystems 
> > with millions of files / long pathnames, just the directory entries 
> > alone can take up gigs per image. Multiply that out by frequency and 
> > it can quickly add up.
> 
> Huh?
> 

That seemed excessive to me, too.  In a short test it seems accurate.

I tried the following (in Zsh):

mkdir /tmp/rsync-test
cd /tmp/rsync-test
for l in {00..99}/{00..99}/{00..99} ; do mkdir -p $l ; done

And, it's a slow process on my machine, but each set of 10,000 
directories seems to add about 40MB on an ext3 filesystem.

E.g. After the '00/{00..99}/{00..99}' directories existed, 'du -sh' showed 
~40MB.  After '{00..01}/{00..99}/{00..99}' were done, ~80MB.

As I write this, after '{00..04}/{00..99}/{00..99}' are done, ~199MB.

I'm not sure the path lengths add any overhead (unless you just meant as 
a result of having more directories).  Each dir adds 4K on my system 
(though I know larger directories add multiples of that block size).

So, with millions of files, yes, the directory entries alone could 
add up to gigs of space.

Interestingly, the same test on reiserfs (which I tend to use for fs'es 
with many small files) seems to show about 200 *K* per 10,000 
directories (about 72 bytes per directory).  It seems pretty highly 
dependent on choice of fs.

But, that overhead is incurred with --link-dest, too.  Even without any 
changes:

for l in original/00/{00.99}/{00..99} ; do
    mkdir -p $l
    touch $l/file
done
rsync -av original/ firstbackup/
rsync -av --link-dest=`pwd`/original original/ secondbackup/
du -sh original firstbackup secondbackup
40M     original
40M     firstbackup
40M     secondbackup

-- 
Best,
Ben


More information about the rsync mailing list