rsync --link-dest, --delete and hard-link count

grarpamp grarpamp at gmail.com
Sat Feb 6 19:21:19 MST 2010


>> Huh?

See now? :-)

> That seemed excessive to me, too.  In a short test it seems accurate.

Yep, it is.

>  And, it's a slow process on my machine

Ramdisk is good for such empty file tests. And you can blow it away
in a second by simply nuking the core. Instead of waiting for rm to
grind the rust off your platters.

> each set of 10,000 directories seems to add about 40MB on an ext3

Your linear test case will yield linear growth.

>  I'm not sure the path lengths add any overhead (unless you just meant as
>  a result of having more directories).

The space used is, at minimum, the sum of all chars in the full pathname
[dirname + basename] from the mountpoint, plus whatever overhead your
particular FS needs to store it. No FS I know of compresses pathnames, only
filedata. And it doesn't matter what the inode type is...
dir/file/char/block/slink/fifo/sock/etc, it's the pathname that counts.

>  (though I know larger directories add multiples of that block size).

As dir blocks fill with pathnames, new dir blocks are needed.

>  So, with millions of files, yes, the directory entries alone could
>  add up to gigs of space.

Yep, simple math: size X copies. A small box here has 3M inodes
in use. That's at minimum 3MiB, if the pathname was just one char,
which it's not. Say it's 112 [average on this one], 30 copies a month
is ~10GiB... excluding filesystem overhead to store it.

> It seems pretty highly dependent on choice of fs.

Yep. In your case, minimum <= reiser <= ext3. Newfs tunables can
have an effect too.

ZFS has a cool property where the size of the directory is the number of
entries in it. UFS has a cool property where the size of the directory is
about the sum of the chars within it. They both still take up the same
minimum space on disk just like any other FS. You can play around with
du -A/l, ls -is, stat, etc.

>  But, that overhead is incurred with --link-dest, too.

Doesn't matter what you use to manage your hiers.
Hardlinks 'reuse' only the stat(2) and filedata space.

Now you know :) Can I have my feature yet? Heh ;)


More information about the rsync mailing list