Rsync and dispersed storage [Re: Pushing hard-linked backups]

Eric S. Johansson esj at
Sun Dec 30 04:52:12 GMT 2007

Matt McCutchen wrote:
> On Fri, 2007-12-28 at 00:15 -0500, Eric S. Johansson wrote:
>> it is possible, I've seen it done, but I can't find the library/tool anymore.
> I'm curious: what was the nature of this tool (if you remember)?  A
> modified version of rsync?  A dispersed storage service with an rsync
> daemon interface?  A virtual filesystem?  Did delta transfers work
> properly?

Unfortunately, my memory is somewhat foggy on the finer points of this toolkit. 
  What I remember is that it split up the backup image into N parts and you only 
needed to recover M in order to reconstruct your data set.  I seem to remember 
it used something analogous to rsync to minimize unnecessary duplication but, 
that's only a faint memory.  I believe it was orders of magnitude better than 
what we have today and it's a real shame it never caught on.

>> again with a pre and post processing capability, we could add that functionality 
>> in without modding the baseline
> It's not clear to me what kind of pre- or postprocessing capability you
> are thinking of that would make dispersed storage with rsync practical.
> A per-file approach with --source-filter and --dest-filter would have
> the same disadvantages as per-file encryption, and by the time you
> arrange for the retrieval filter to combine multiple files, I don't see
> what you gain by having rsync in the picture.  Using rsync with a
> virtual filesystem that implements the dispersed storage is more
> natural.

What seemed like a bright idea during composition appears more like a rusty 
clunker after one hits the send key.  :-)  The original, albeit poorly thought 
out, idea was to replicate the directory structure on all destination machines, 
then take each file, split into M parts and then replicate each part into the 
right portion of the filesystem hierarchy on remote machines.  Yes, not the best 
idea I've ever had.  I think the virtual filesystem does make a lot more sense. 
   How would you implement caching so that you only need to scan the local 
filesystem once instead of every time you compare it against the remote file system?

More information about the rsync mailing list