patch to enable faster mirroring of large filesystems

Tue Nov 27 11:25:31 EST 2001

On 26 Nov 2001, "Andrew J. Schorr" <ajschorr at yahoo.com> wrote:

> I understand your point of view, but I think it is a mistake to
> hold rsync's algorithm hostage to the directory tree traversal logic
> built into the program.

> IMHO, the basic file transfer algorithm of rsync is terrific, but
> the program wrapped around it is a bit out of control.

I completely agree, and you can see me talking about this in earlier
posts.  I think the large number of command-line options rsync
accumulates, to say nothing of the number of patches not yet accepted
that would add more, indicates some kind of fundamental problem.

The protocol also is pretty crufty.

So breaking it out would be really useful.

Possibly the best way to enable this is to add scripting support.  I
think many people would find it useful if rsync could execute a
fragment of e.g. perl code to decide whether to transfer each file.

Although rsync is hostage to the directory tree traversal algorithm,
it's also one of the most useful parts...

Just doing directory-by-directory transfer would give an immediate and
substantial improvement to the common case of people using rsync to
transfer big directory trees.

> The spirit of my patch is to expose the low-level rsync algorithm and
> to allow people to build up their customized infrastructure outside
> of the program instead of having to build it in.  I think this is in
> the spirit of Unix tools.  I think if rsync were to expose some of its
> low-level capabilities, then we would not have a need for xdelta and rdiff,
> projects which are springing up because of rsync's opaqueness.

To go in this directlion you need to decide whether 

 A) To maintain wire compatibility with the existing code.  This would
 be pretty useful because otherwise you'll need to install the new
 version on both machines.  You can kind of kludge around this by
 having two rather different protocols that bifurcate at the first
 packet, which I think is what SSH1/SSH2 do.  There's some limited
 support in the current protocol for protocol versioning.

 B) Whether you want to preserve the model of every invocation being
 an idempotent "copy this to here", or to make it into a more
 generalized file manipulation protocol like FTP or NFS.

If one was going to write a new protocol with a new model of operation
then perhaps it would be better to use librsync and make it a
completely separate program.

At this stage I'm inclined to take the conservative answer to both
questions, but having a discussion about them could be useful.

-- 
Martin