Adding support for versioned files in rsync
jw at pegasys.ws
Tue Oct 14 07:46:54 EST 2003
On Mon, Oct 13, 2003 at 02:30:32PM -0700, Ben Escoto wrote:
> On Mon, 13 Oct 2003 15:43:35 -0400
> "Jason M. Felice" <jfelice at cronosys.com> wrote:
> > dirvish looks interesting. One of the requirements that I now realize I
> > didn't write into the proposal was the ability to store only a single
> > copy of duplicate files... duplicate as determined by file contents, not
> > naming or inode or anything. (Rationale: will likely need to back up
> > large numbers of clients with lots of files in common.)
A re-read of this paragraph highlighted the multiple-client
issue which dirvish does address rather well through branches.
Mail me off-list if you need specifics of how to set that
The more general problem of coalescing duplicate files is a
real performance pig. To do that what must be done is to
hash(checksum) every single file, store the files based on
that hash and then use a database to associate the paths
with the files with a separate handling of hardlinks.
Then while doing all of that you somehow want to be able to
use the rsync algorithm between versions of the same file.
Doing that is far outside the scope of rsync.
> Check out Craig Barrett's BackupPC at
> It has this feature and I have heard good things about it. And since
> I'm posting this I might as well mention my project at
> http://rdiff-backup.stanford.edu. It does versions and uses the rsync
> algorithm, but doesn't notice duplicate files.
I generally don't wish to appear to be pitching my own
project (dirvish) which is why i don't often provide a link
(google does that for me) and generally mention it in the
context of other tools. But the other tools list is
starting to get a bit long as there are so many permutations
of the wheel.
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync