Large files and symlinks

jw schultz jw at pegasys.ws
Thu Jul 31 11:35:45 EST 2003


On Thu, Jul 31, 2003 at 11:21:48AM +1000, Donovan Baarda wrote:
> On Thu, 2003-07-31 at 10:01, jw schultz wrote:
> > On Thu, Jul 31, 2003 at 09:22:51AM +1000, Donovan Baarda wrote:
> > > On Thu, 2003-07-31 at 06:53, jw schultz wrote:
> The simplest solution is to write the partial download over the
> beginning of the old file, leaving the end part as it was.
> 
> This way you are making the sensible assumption that most of the matches
> from the start of the file match the partial download, and the remainder
> will match the rest when you resume. A match locality heuristic?
> 
> I suspect this will be less confusing for end users too... the partial
> download will have it's size unchanged, and when you look at the data
> you will be able to see that it "synchronised up to point xxx". It will
> look like a partial in-place update.

A sensible approach.  It is i think more complex than
deciding whether to replace or discard.  This requires you
add a non-truncating copy (not difficult, just much more
work than making the finish_transfer() call conditional.

You also would need to be sure modtime isn't set to match
source, something finish_transfer() does, because now the
length may be the same,

> > One idea that i think has real merit would be to combine
> > some kind of change-rate score with an evaluation of
> > comparative sizes of the tempfile and the original file to
> > decide if replacing the original or leaving it would be more
> > efficient.  If there was no data-reuse then replacement
> > would be in order.  If there was a high rate of reuse it
> > wouldn't.  If the reuse was middling you would consider the
> > comparative sizes.  The formula would probably be pretty
> > simple.  If someone comes up with a patch that does that i'd
> > be willing to entertain it.
> 
> I'm not convinced this would be worth the effort... I'm sure in many
> cases the beginning of the file is where most of the changes are, so
> throwing away the end on the basis of poor matches at the start is a bad
> idea.

See my other followup where i show the formula.  As i
describe it here it sounds more complicated than it is.

Either way would be a big improvement for those using
--partial.  The overwrite being better than replace.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list