similar file names..

Martin Pool mbp at samba.org
Mon Apr 8 01:50:03 EST 2002


On  8 Apr 2002, Condarelli Mauro <mauro at thor.samba.org> wrote:
> David Goodwin wrote:
> >Hi all,
> >
> >I've recently started using rsync, and would like to know if there is 
> >any way for it to realise that similarly named files are the same.
> >
> >e.g. sendmail-8.12.2-i386-1.tgz and sendmail-8.12.2-i386-2.tgz
> >
> >(the second one being an update on the first and replaces it, but it is 
> >going to be very similar)
> Unfortunately this is _not_ true.

The situation should improve soon.  Two changes are required, and
thanks to Paul Russell they're now close to being merged.

One is a --fuzzy patch to rsync, which allows it to detect files with
similar names.  

The second is a patch to gzip which makes unchanged sections of the
input file map to unchanged sections of the compressed file, using a
rolling checksum similar to that of rsync.  As Mauro points out 

> Since these are compressed files a few bytes change at the beginning of 
> file will result in two almost completely different files.

With this backwards-compatible change to gzip, there will be a short
run of new data at the start of the gzip file, and then the rest of
the file will continue as before, so rsync should work well.

Patches to do both of these are now available in rsync/src/patches/ in
CVS.  I'd be interested to hear feedback on them, though please don't
use them on critical systems yet.

-- 
Martin 




More information about the rsync mailing list