rsync --link-dest, --delete and hard-link count
Benjamin R. Haskell
rsync at benizi.com
Sat Feb 6 15:46:49 MST 2010
On Sat, 6 Feb 2010, Tony Abernethy wrote:
> Grarpamp wrote:
> > Yes, hardlinks save data block duplication... yet on filesystems
> > with millions of files / long pathnames, just the directory entries
> > alone can take up gigs per image. Multiply that out by frequency and
> > it can quickly add up.
>
> Huh?
>
That seemed excessive to me, too. In a short test it seems accurate.
I tried the following (in Zsh):
mkdir /tmp/rsync-test
cd /tmp/rsync-test
for l in {00..99}/{00..99}/{00..99} ; do mkdir -p $l ; done
And, it's a slow process on my machine, but each set of 10,000
directories seems to add about 40MB on an ext3 filesystem.
E.g. After the '00/{00..99}/{00..99}' directories existed, 'du -sh' showed
~40MB. After '{00..01}/{00..99}/{00..99}' were done, ~80MB.
As I write this, after '{00..04}/{00..99}/{00..99}' are done, ~199MB.
I'm not sure the path lengths add any overhead (unless you just meant as
a result of having more directories). Each dir adds 4K on my system
(though I know larger directories add multiples of that block size).
So, with millions of files, yes, the directory entries alone could
add up to gigs of space.
Interestingly, the same test on reiserfs (which I tend to use for fs'es
with many small files) seems to show about 200 *K* per 10,000
directories (about 72 bytes per directory). It seems pretty highly
dependent on choice of fs.
But, that overhead is incurred with --link-dest, too. Even without any
changes:
for l in original/00/{00.99}/{00..99} ; do
mkdir -p $l
touch $l/file
done
rsync -av original/ firstbackup/
rsync -av --link-dest=`pwd`/original original/ secondbackup/
du -sh original firstbackup secondbackup
40M original
40M firstbackup
40M secondbackup
--
Best,
Ben
More information about the rsync
mailing list