Time rsYnc Machine (tym)

Linda Walsh rsync at tlinx.org
Tue Aug 7 15:07:04 MDT 2012



Dan Stromberg wrote:
> FWIW, it might be nice to add a hardlink detecting bloom filter to rsync 
> at some point.  This makes the process of detecting hardlinks less 
> expensive.  Another way to narrow down the field is to just look at 
> st_nlink.
----
	What's a bloom filter? and how / why would it make things
less expensive?  I don't understand why it is expensive now?  You have
to visit all files -- likely a previsit to get a size estimate -- reading
all the inodes at that point,

Then have a hash 'ino2names' for each inode to point to an array name of files
found in the tree with the same inode

%ino2names
$ino2names=>[array of paths relative to root of tree being examined]

Since the size of the transfer is known after the initial scan -- all the inode
inode->path mapping would be knowable as well, at that point.  No extra expense 
involved.

Of course given the error I reported, it seems rsync has gotten broken, recently
with hard links -- they aren't that difficult.  It is presumed, that links 'out 
of tree'
are ignored @ source and target -- meaning target files end up with same internal
linkages as on source, and any external links would be broken.

I still have no clue what a bloom filter is?? ;-)  Cluesticks anyone?





More information about the rsync mailing list