rsync --link-dest, --delete and hard-link count

Benjamin R. Haskell rsync at benizi.com
Sun Feb 7 10:44:40 MST 2010


On Sun, 7 Feb 2010, grarpamp wrote:

> >  [[Benjamin R. Haskell wrote:]]
> >  And, it's a slow process on my machine
> 
> Ramdisk is good for such empty file tests.

Good tip.


> And you can blow it away in a second by simply nuking the core.  
> Instead of waiting for rm to grind the rust off your platters.

I was surprised how fast 'rm' worked, actually.  I assume it just 
obliterates the highest level in the hierarchy.


> >  I'm not sure the path lengths add any overhead (unless you just 
> >  meant as a result of having more directories).
> 
> The space used is, at minimum, the sum of all chars in the full 
> pathname [dirname + basename] from the mountpoint, plus whatever 
> overhead your particular FS needs to store it. No FS I know of 
> compresses pathnames, only filedata. And it doesn't matter what the 
> inode type is...  dir/file/char/block/slink/fifo/sock/etc, it's the 
> pathname that counts.

[...]

> I goofed up one part of this:
> 
> Since things are stored in hierarchy, not flat... it's not the sum of 
> all the full pathnames. But the sum of the names in each directory and 
> the sum of all those sums. Adjust theory minimums accordingly.

It's also not just a sum, though.  Throw in a ceiling on 
multiples-of-blocksize if applicable for a given fs.  So, e.g.:

for l in 00/{00..99}/{00..99} ; do mkdir -p $l ; done
and
for l in aaaaaa00/aaaaaa{00..99}/aaaaaa{00..99} ; do mkdir -p $l ; done

take the same amount of size on-disk for ext3.  (No dir exceeds my 4K 
block size.)


> Still can use great gobs of space though, easily overlooked.
> 
> And at least on these FS's, the cost lies among the total number of 
> dirs/files and their langth.

What's the 'MFS' you're using?  I assume it's not MacOS FS.  Do I assume 
correctly from the UFS comparison that it's BSD Memory File System? or 
Moose File System?


> {00..00}/{00..99}/{00..99}
> find . | wc 10102 lines 110807 chars
> MFS: du -cks 20204
> ZFS: du -cks 85867

ext3: du -B 1 -s 41377792 (40408K)
reiserfs: du -B 1 -s 206848 (202K)

(Using -B 1 to show that ext3 stays exactly the same...  
directly-comparable number in parens.)
(reiserfs = reiserfs3)

> 
> {00000000000..00000000000}/{00000000000..00000000099}/{00000000000..00000000099}
> find . | wc 10102 lines 382616 chars
> MFS: du -cks 20406
> ZFS: du -cks 61835

ext3: du -B 1 -s 41377792 (40408K)
reiserfs: du -B 1 -s 310272 (303K)

> 
> {1x<255chars>}/{100*<255chars>}/{100*<255chars>}  # 0123456789 * 25.5
> find . | wc 10102 lines 7751660 chars
> MFS: du -cks 25052
> ZFS: du -cks 89923
> 
> {000..000}/{000..009}/{000..999}
> find . | wc 10102 lines 140108 chars
> MFS: du -cks 20124
> ZFS: du -cks 62526
> 
> {00000..31999}
> find . | wc 32001 lines 256002 chars
> MFS: du -cks  64528
> ZFS: du -cks 276153
> 
> {00000..31999}  # not mkdir, touch $i
> find . | wc 32001 lines 256002 chars
> MFS: du -cks   528
> ZFS: du -cks 20153
> 
> {00000..20000}  # not mkdir, touch <240 0 chars>$i
> find . | wc 20002 lines 4960250 chars
> MFS: du -cks  5024
> ZFS: du -cks 14185
> 
> UFS: numbers same as MFS, only much slower
> ZFS: seems to make some adjustments on subsequent runs
> ALL: these FS's were quite full


More information about the rsync mailing list