Time rsYnc Machine (tym)
Linda Walsh
rsync at tlinx.org
Tue Aug 7 15:07:04 MDT 2012
Dan Stromberg wrote:
> FWIW, it might be nice to add a hardlink detecting bloom filter to rsync
> at some point. This makes the process of detecting hardlinks less
> expensive. Another way to narrow down the field is to just look at
> st_nlink.
----
What's a bloom filter? and how / why would it make things
less expensive? I don't understand why it is expensive now? You have
to visit all files -- likely a previsit to get a size estimate -- reading
all the inodes at that point,
Then have a hash 'ino2names' for each inode to point to an array name of files
found in the tree with the same inode
%ino2names
$ino2names=>[array of paths relative to root of tree being examined]
Since the size of the transfer is known after the initial scan -- all the inode
inode->path mapping would be knowable as well, at that point. No extra expense
involved.
Of course given the error I reported, it seems rsync has gotten broken, recently
with hard links -- they aren't that difficult. It is presumed, that links 'out
of tree'
are ignored @ source and target -- meaning target files end up with same internal
linkages as on source, and any external links would be broken.
I still have no clue what a bloom filter is?? ;-) Cluesticks anyone?
More information about the rsync
mailing list