Problems with rsync 2.5.1pre1 and hardlinks

tim.conway at philips.com tim.conway at philips.com
Thu Dec 13 02:28:01 EST 2001


I have to post a correction to this:
"
* Why would symlinks eat up disk space?

Correction, of course they don't.  I have to use hardlinks because every
"
a symbolic link does, in fact, consume disk space.  Unlike a hard link, 
(which is just another pointer to the files data, a symbolic link is a 
pointer to the files name, and fits into the directories data blocks, and 
takes up an allocation of filesystem data space, in which to store the 
arbitrary-length name, rather than the inode number.  One effect of this 
is that it allows a symbolic link to point out of the filesystem its in., 
because a directory entry points to an inode, having no way to indicate 
that its in another filesystem. 

hard link=directory entry=a name and an inode.
symbolic link=(like a)file containing the name of what it points to.

I know, technically, a symlink is not a file.  That distinction is 
maintained by the 0xF000 nybble of the mode of the object.  nevertheless, 
a directory takes space, a file takes space, a symbolic link takes space. 
An additional hard link to an existing file takes only directory space, 
which, if it's not enough of an addition to that directories existing data 
to cause the filesystem driver to add another allocation to the 
directories data space, takes up no more disk space.  A symlink, however, 
has the same effect in the directory, but, in addition, gets its own data 
space, and inode, as well.

In this example, an empty file is created.  it takes an inode, no space. 
adding another link to it takes neither space, nor an inode.
adding a symbolic link to it takes up both an inode and space... another 
symlink, to the other hard link, does the same.
adding 512 new hard links to the original file takes up no more inodes, 
but it does take up disk space, by causing the directory to expand to hold 
all those names. 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tconway at atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662512 blocks   250481 files
/var/tmp/test>ls -ld
drwxrwxrwx   2 tconway  Vlsieng     512 Dec 12 07:59 .
tconway at atlas
/var/tmp/test>touch emptyfile
tconway at atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662512 blocks   250480 files
tconway at atlas
/var/tmp/test>ln emptyfile emptyfile2
tconway at atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662512 blocks   250480 files
tconway at atlas
/var/tmp/test>ln -s emptyfile linktoemptyfile
tconway at atlas
/var/tmp/test>df . 
/var               (/dev/dsk/c0t0d0s3 ): 1662510 blocks   250479 files
tconway at atlas
/var/tmp/test>ln -s emptyfile2 linktoemptyfile2
tconway at atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662508 blocks   250478 files
tconway at atlas
/var/tmp/test>ls -li
total 4
     27376 -rw-rw-rw-   2 tconway  Vlsieng        0 Dec 12 07:59 emptyfile
     27376 -rw-rw-rw-   2 tconway  Vlsieng        0 Dec 12 07:59 
emptyfile2
     27377 lrwxrwxrwx   1 tconway  Vlsieng        9 Dec 12 08:00 
linktoemptyfile -> emptyfile
     27380 lrwxrwxrwx   1 tconway  Vlsieng       10 Dec 12 08:00 
linktoemptyfile2 -> emptyfile2
tconway at atlas
/var/tmp/test>ls -ld
drwxrwxrwx   2 tconway  Vlsieng      512 Dec 12 08:00 .
tconway at atlas
/var/tmp/test>for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f
> do
> for b in 0 1 2 3 4 5 6 7 8 9 a b c d e f
> do
> ln emptyfile $a$b
> done
> done
tconway at atlas
/var/tmp/test>ls -ld
drwxrwxrwx   2 tconway  Vlsieng     3584 Dec 12 08:03 .
tconway at atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662502 blocks   250478 files
tconway at atlas
/var/tmp/test>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I know it's not rsync-specific, but we're mostly unix guys, and need to be 
correct. 

Tim Conway
tim.conway at philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn, 
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), 
".\n" '
"There are some who call me.... Tim?"




birger at takatukaland.de
Sent by: rsync-admin at lists.samba.org
12/11/2001 11:42 PM

 
        To:     Dave Dykstra <dwd at bell-labs.com>
rsync at lists.samba.org
        cc:     (bcc: Tim Conway/LMT/SC/PHILIPS)
        Subject:        Re: Problems with rsync 2.5.1pre1 and hardlinks
        Classification: 



Dave Dykstra schrieb am Mon, Dec 10, 2001 at 02:21:46PM -0600:
[...]
* > 
* > * 
* > * Ideas:
* > *     1. Would it be possible to use symlinks instead of hardlinks? 
That
* > *            would give you more flexibility to split things up 
however you
* > *            like.
* > *     2. Perhaps you could break it up into ~70 copies, where each 
time you
* > *            give it the first directory that contains the data and 
another
* > *            one that contains one of the hardlinks.
* > 
* > Both alternatives will eat up huge amounts of disk space as the 
numbers
* > above suggest.  I will therefore consider plugging in more mem/swap 
before
* > trying them.
* 
* Why would symlinks eat up disk space?

Correction, of course they don't.  I have to use hardlinks because every
directory provides a chrooted environment that'll break when using 
symlinks.


* 
* I only suggested the second alternative because I thought that it would
* end up with all the destination files hardlinked together as on the
* original system.  I hadn't tested it, but now I did and it works:
* 
*     $ mkdir s s/d1 s/d2 s/d3 t
*     $ touch s/d1/l1
*     $ ln s/d1/l1 s/d2/l1
*     $ ln s/d1/l1 s/d3/l1
*     $ ls -li s/*/l1
*                286226 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 s/d1/l1
*                286226 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 s/d2/l1
*                286226 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 s/d3/l1
*     $ rsync -aH s/d1 s/d2 t
*     $ rsync -aH s/d1 s/d3 t
*     $ ls -li t/*/l1
*                622728 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 t/d1/l1
*                622728 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 t/d2/l1
*                622728 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 t/d3/l1

Ah, now I understand what you mean.  Yes that will surely work and 
minimize
memory utilisation on the cost of a somewhat complex rsync mechanism. I'll
definitely fall back on that technique if I'll be still short on memory 
after
some upgrades.

("somewhat complex": the backup server does not know anything about the 
 directories to be synced, maybe there will be new ones - so you'll have
 to provide a comprehensive list of the dirs from the main server to set
 up the logic above)


- Birger








More information about the rsync mailing list