Large files and symlinks
jw schultz
jw at pegasys.ws
Thu Jul 31 10:25:43 EST 2003
On Wed, Jul 30, 2003 at 05:01:26PM -0700, jw schultz wrote:
> On Thu, Jul 31, 2003 at 09:22:51AM +1000, Donovan Baarda wrote:
> > On Thu, 2003-07-31 at 06:53, jw schultz wrote:
> > [...]
> > > In many cases invoking --partial is worse than not. If you
> > > are rsyncing a 4GB file and transfer is interrupted after
> > > 500MB has been synced you get a 500MB file which now has
> > > less in common with the source than the 4GB file did.
> >
> > A more useful behaviour for --partial would be to concatinate the
> > partial download to the end of the old "basis", rather than replace
> > it... this leaves you with a much more useful "partial" result to resume
> > from.
> >
> > Of course this behaviour could be _very_ confusing to people... :-)
>
> Interesting idea. I don't know that it would be all that confusing.
>
> You'd have to truncate the basis to the length of the source
> to prevent it growing with each failure. Even appending
> 3.5GB to a 4GB file once is problematic.
>
> If we were to append to the existing file it might make
> sense to append only those portions that were updates.
> That would require keeping track of the offset+length of
> each change block. Yuck, that is much more work that it is
> worth.
>
> One idea that i think has real merit would be to combine
> some kind of change-rate score with an evaluation of
> comparative sizes of the tempfile and the original file to
> decide if replacing the original or leaving it would be more
> efficient. If there was no data-reuse then replacement
> would be in order. If there was a high rate of reuse it
> wouldn't. If the reuse was middling you would consider the
> comparative sizes. The formula would probably be pretty
> simple. If someone comes up with a patch that does that i'd
> be willing to entertain it.
Here is the exploded formula for testing. I don't think it
uses any data we don't already have for the sake of per-file
statistics.
match_ratio = (bytes_matched / tempsize)
probable_match = match_ratio * orgigsize
if (probable_match < tempsize)
keep temp
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list