[RSYNC] Re: Compressed backup

Ph. Marek marek at bmlv.gv.at
Thu Jun 13 03:38:02 EST 2002


I've got a suggestion regarding the mail Kevin wrote:

Instead of comparing the least m bits of n bytes I'd suggest using a
algorithm as described in the Paper
	http://webglimpse.org/publications.html
	"Siff -- Finding Similar Files in a Large File System"

	ftp://ftp.cs.arizona.edu/reports/1993/TR93-33.ps

(Search for "Manber finding" in google for more references)


This calculates a rolling checksum for n bytes (n can be chosen).
In this paper a action is taken if m bits of this checksum are a defined
value, similar to 
	if (!(checksum & 0xff)) { ... }
for 8 bits = 0 which gives a 1/256 chance.

I wrote the program described there (siff) and it works very well.


Regards, 

Phil





More information about the rsync mailing list