Large files and symlinks

jw schultz jw at pegasys.ws
Thu Jul 31 10:25:43 EST 2003


On Wed, Jul 30, 2003 at 05:01:26PM -0700, jw schultz wrote:
> On Thu, Jul 31, 2003 at 09:22:51AM +1000, Donovan Baarda wrote:
> > On Thu, 2003-07-31 at 06:53, jw schultz wrote:
> > [...]
> > > In many cases invoking --partial is worse than not.  If you
> > > are rsyncing a 4GB file and transfer is interrupted after
> > > 500MB has been synced you get a 500MB file which now has
> > > less in common with the source than the 4GB file did.  
> > 
> > A more useful behaviour for --partial would be to concatinate the
> > partial download to the end of the old "basis", rather than replace
> > it... this leaves you with a much more useful "partial" result to resume
> > from.
> > 
> > Of course this behaviour could be _very_ confusing to people... :-)
> 
> Interesting idea.  I don't know that it would be all that confusing.
> 
> You'd have to truncate the basis to the length of the source
> to prevent it growing with each failure.  Even appending
> 3.5GB to a 4GB file once is problematic.
> 
> If we were to append to the existing file it might make
> sense to append only those portions that were updates.
> That would require keeping track of the offset+length of
> each change block.  Yuck, that is much more work that it is
> worth.
> 
> One idea that i think has real merit would be to combine
> some kind of change-rate score with an evaluation of
> comparative sizes of the tempfile and the original file to
> decide if replacing the original or leaving it would be more
> efficient.  If there was no data-reuse then replacement
> would be in order.  If there was a high rate of reuse it
> wouldn't.  If the reuse was middling you would consider the
> comparative sizes.  The formula would probably be pretty
> simple.  If someone comes up with a patch that does that i'd
> be willing to entertain it.

Here is the exploded formula for testing.  I don't think it
uses any data we don't already have for the sake of per-file
statistics.

	match_ratio = (bytes_matched / tempsize)
	probable_match = match_ratio * orgigsize

	if (probable_match < tempsize)
		keep temp

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list