Large files and symlinks
jw schultz
jw at pegasys.ws
Thu Jul 31 11:35:45 EST 2003
On Thu, Jul 31, 2003 at 11:21:48AM +1000, Donovan Baarda wrote:
> On Thu, 2003-07-31 at 10:01, jw schultz wrote:
> > On Thu, Jul 31, 2003 at 09:22:51AM +1000, Donovan Baarda wrote:
> > > On Thu, 2003-07-31 at 06:53, jw schultz wrote:
> The simplest solution is to write the partial download over the
> beginning of the old file, leaving the end part as it was.
>
> This way you are making the sensible assumption that most of the matches
> from the start of the file match the partial download, and the remainder
> will match the rest when you resume. A match locality heuristic?
>
> I suspect this will be less confusing for end users too... the partial
> download will have it's size unchanged, and when you look at the data
> you will be able to see that it "synchronised up to point xxx". It will
> look like a partial in-place update.
A sensible approach. It is i think more complex than
deciding whether to replace or discard. This requires you
add a non-truncating copy (not difficult, just much more
work than making the finish_transfer() call conditional.
You also would need to be sure modtime isn't set to match
source, something finish_transfer() does, because now the
length may be the same,
> > One idea that i think has real merit would be to combine
> > some kind of change-rate score with an evaluation of
> > comparative sizes of the tempfile and the original file to
> > decide if replacing the original or leaving it would be more
> > efficient. If there was no data-reuse then replacement
> > would be in order. If there was a high rate of reuse it
> > wouldn't. If the reuse was middling you would consider the
> > comparative sizes. The formula would probably be pretty
> > simple. If someone comes up with a patch that does that i'd
> > be willing to entertain it.
>
> I'm not convinced this would be worth the effort... I'm sure in many
> cases the beginning of the file is where most of the changes are, so
> throwing away the end on the basis of poor matches at the start is a bad
> idea.
See my other followup where i show the formula. As i
describe it here it sounds more complicated than it is.
Either way would be a big improvement for those using
--partial. The overwrite being better than replace.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list