Large files and symlinks

jw schultz jw at pegasys.ws
Thu Jul 31 10:01:26 EST 2003


On Thu, Jul 31, 2003 at 09:22:51AM +1000, Donovan Baarda wrote:
> On Thu, 2003-07-31 at 06:53, jw schultz wrote:
> [...]
> > In many cases invoking --partial is worse than not.  If you
> > are rsyncing a 4GB file and transfer is interrupted after
> > 500MB has been synced you get a 500MB file which now has
> > less in common with the source than the 4GB file did.  
> 
> A more useful behaviour for --partial would be to concatinate the
> partial download to the end of the old "basis", rather than replace
> it... this leaves you with a much more useful "partial" result to resume
> from.
> 
> Of course this behaviour could be _very_ confusing to people... :-)

Interesting idea.  I don't know that it would be all that confusing.

You'd have to truncate the basis to the length of the source
to prevent it growing with each failure.  Even appending
3.5GB to a 4GB file once is problematic.

If we were to append to the existing file it might make
sense to append only those portions that were updates.
That would require keeping track of the offset+length of
each change block.  Yuck, that is much more work that it is
worth.

One idea that i think has real merit would be to combine
some kind of change-rate score with an evaluation of
comparative sizes of the tempfile and the original file to
decide if replacing the original or leaving it would be more
efficient.  If there was no data-reuse then replacement
would be in order.  If there was a high rate of reuse it
wouldn't.  If the reuse was middling you would consider the
comparative sizes.  The formula would probably be pretty
simple.  If someone comes up with a patch that does that i'd
be willing to entertain it.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list