setting checksum_seed

Craig Barratt cbarratt at users.sourceforge.net
Sun May 2 00:06:10 GMT 2004


jw schultz writes:

> > > There was some talk last year about adding a --fixed-checksum-seed
> > > option, but no consensus was reached.  It shouldn't hurt to make the
> > > seed value constant for certain applications, though, so you can feel
> > > free to proceed in that direction for what you're doing for your client.
> > > 
> > > FYI, I just checked in some changes to the checksum_seed code that will
> > > make it easier to have other options (besides the batch ones) specify
> > > that a constant seed value is needed.
> > 
> > I would really like a --fixed-csumseed option become a standard
> > feature in rsync.  Just using the batch value (32761) is fine.
> > Can I contribute a patch?  The reason I want this is the next
> > release of BackupPC will support rsync checksum caching, so that
> > backups don't need to recompute block or file checksums.  This
> > requires a fixed checksum seed on the remote rsync, hence the
> > need for --fixed-csumseed.  I've included this feature in a
> > pre-built rsync for cygwin that I include on the SourceForge
> > BackupPC downloads.
> 
> 1.  Yes, you may contribute a patch.  I favor the idea of
> being able to supply a checksum seed.
> 
> 2.  Lets get the option name down to a more reasonable
> length.  --checksum-seed should be sufficient.

I submitted a patch in Feb 2004 to add a --fixedcsum-seed option
(which only sets checksum_seed to 32761, the batch file value):

    http://lists.samba.org/archive/rsync/2004-February/008616.html

Earlier, I submitted a patch (against 2.5.6pre1 in Jan 2003)
for --checksum-seed=NUM:

    http://lists.samba.org/archive/rsync/2003-January/004845.html

Since I posted both of these patches, there was an interesting thread
started by Eran Tromer about potential block checksum collisions that
could be exploited by someone to trigger first-pass failures. See:

    http://lists.samba.org/archive/rsync/2004-March/008821.html

The consequence is just a performance penalty, since with very
high probability the whole-file checksum fails, triggering the
second pass with the full checksum size, which will succeed.
Eran recommended that checksum_seed be more random than time().

BackupPC now supports rsync checksum caching, so I would really like
an rsync command-line option to set the checksum_seed.  Based on the
thread started by Eran I am reverting to the --checksum-seed=NUM form,
since this allows paranoid users to pick their own random value should
they wish to avoid the issue raised by Eran, plus it also allows my
BackupPC users to specify a fixed value so that caching is useful
(subject to the same caveats raised by Eran).

Here's a new patch against rsync-2.6.2.  JW's earlier changes
have simplified this patch.  Could this be applied to CVS,
or at a minimum added to the patches directory?

Note: the patch does not allow the case of --checksum-seed=0, since
the code in compat.c replaces the value 0 with time(0).  I don't think
it is necessary to support this case (which means disable adding the
seed to the MD4 digests).  If people feel strongly about this I can
also support the case --checksum-seed=0, although it will make the
code a little uglier (we'll need another global variable).

Thanks,
Craig

--- options.c	2004-04-17 10:07:23.000000000 -0700
+++ options.c	2004-05-01 16:24:44.380672000 -0700
@@ -290,6 +290,7 @@
   rprintf(F,"     --bwlimit=KBPS          limit I/O bandwidth, KBytes per second\n");
   rprintf(F,"     --write-batch=PREFIX    write batch fileset starting with PREFIX\n");
   rprintf(F,"     --read-batch=PREFIX     read batch fileset starting with PREFIX\n");
+  rprintf(F,"     --checksum-seed=NUM     set block/file checksum seed\n");
   rprintf(F," -h, --help                  show this help screen\n");
 #ifdef INET6
   rprintf(F," -4                          prefer IPv4\n");
@@ -386,6 +387,7 @@
   {"from0",           '0', POPT_ARG_NONE,   &eol_nulls, 0, 0, 0},
   {"no-implied-dirs",  0,  POPT_ARG_VAL,    &implied_dirs, 0, 0, 0 },
   {"protocol",         0,  POPT_ARG_INT,    &protocol_version, 0, 0, 0 },
+  {"checksum-seed",    0,  POPT_ARG_INT,    &checksum_seed, 0, 0, 0 },
 #ifdef INET6
   {0,		      '4', POPT_ARG_VAL,    &default_af_hint, AF_INET, 0, 0 },
   {0,		      '6', POPT_ARG_VAL,    &default_af_hint, AF_INET6, 0, 0 },
@@ -911,6 +913,11 @@
 			goto oom;
 		args[ac++] = arg;
 	}
+	if (checksum_seed) {
+		if (asprintf(&arg, "--checksum_seed=%d", checksum_seed) < 0)
+			goto oom;
+		args[ac++] = arg;
+	}
 
 	if (keep_partial)
 		args[ac++] = "--partial";
--- rsync.yo	2004-04-30 11:02:43.000000000 -0700
+++ rsync.yo	2004-05-01 16:59:48.546313600 -0700
@@ -348,6 +348,7 @@
      --bwlimit=KBPS          limit I/O bandwidth, KBytes per second
      --write-batch=PREFIX    write batch fileset starting with PREFIX
      --read-batch=PREFIX     read batch fileset starting with PREFIX
+     --checksum-seed=NUM     set block/file checksum seed
  -h, --help                  show this help screen
 
 
@@ -897,6 +898,20 @@
 using the fileset whose filenames start with PREFIX. See the "BATCH
 MODE" section for details.
 
+dit(bf(--checksum-seed=NUM)) Set the MD4 checksum seed to the integer
+NUM.  This 4 byte checksum seed is included in each block and file
+MD4 checksum calculation.  By default the checksum seed is generated
+by the server and defaults to the current time(), or 32761 if
+bf(--write-batch) or bf(--read-batch) are specified.  This option
+is used to set a specific checksum seed, which is useful for
+applications that want repeatable block and file checksums, or
+in the case where the user wants a more random checksum seed.
+Note that setting NUM to 0 causes rsync to use the default of time()
+for checksum seed.  Note also that --write-batch and --read-batch
+set the checksum seed to 32761, so --checksum-seed=NUM needs to
+follow these options if you want to specify a different checksum
+seed in batch mode.
+
 enddit()
 
 manpagesection(EXCLUDE PATTERNS)


More information about the rsync mailing list