Improving the rsync protocol (RE: Rsync dies)

Mon May 20 17:38:01 EST 2002

On 20 May 2002, Phil Howard <phil-rsync at ipal.net> wrote:
> On Fri, May 17, 2002 at 01:42:31PM -0700, Wayne Davison wrote:
> 
> | On Fri, 17 May 2002, Allen, John L. wrote:
> | > In my humble opinion, this problem with rsync growing a huge memory
> | > footprint when large numbers of files are involved should be #1 on
> | > the list of things to fix.
> | 
> | I have certainly been interested in working on this issue.  I think it
> | might be time to implement a new algorithm, one that would let us
> | correct a number of flaws that have shown up in the current approach.
> 
> OTOH, this mode of operation also needs to be retained.  While I certainly
> would love to have an rsync that can keep millions of files in sync all at
> once, I also have need for an rsync that can readily detect files being
> moved around.  There are obvious difficulties in combining those needs,
> so it should be a deployment issue to decide what to use.

I think detecting files moved across directories is an example of 
something that would be a bit complicated and error-prone to insert
into the current protocol.

One may of handling them would be to scan the entire destination 
tree getting the stat() info for each file.  We can then look 
for files already on the destination that have the same size as one
that we have (above a certain minimum), and check whether they 
are in fact identical.

Doing so requires an upfront scan of the entire destination tree, and
for the client to hold all this information in memory.  So for some
cases at least it will be undesirable.  Probably the right thing is an
option like --renames=global/directory/none.

I think the general point is that we don't really know exactly what
the ideal implementation will be, and preferably the protocol should
not restrict too tightly our ability to change or improve it in future.

--
Martin