batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
cbarratt at users.sourceforge.net
Thu May 20 04:38:09 GMT 2004
Alberto Accomazzi writes:
> What I'm referring to are those options that a client passes to the
> server which influence file selection, checksum and block generation. I
> haven't looked at the rsync source code in quite a while, but off the
> top of my head here are the issues to look at when considering caching a
> filesystem scan:
> 1. Exclude/include patterns:
> -C, --cvs-exclude auto ignore files in the same way CVS does
> --exclude=PATTERN exclude files matching PATTERN
> --exclude-from=FILE exclude patterns listed in FILE
> --include=PATTERN don't exclude files matching PATTERN
> --include-from=FILE don't exclude patterns listed in FILE
> --files-from=FILE read FILE for list of source-file names
> These should be easy to deal with: I would simply have the cache creator
> ignore any --exclude options passed by the client (but probably honor
> the ones defined in a daemon config file).
> 2. Other file selection options:
> -x, --one-file-system don't cross filesystem boundaries
> -S, --sparse handle sparse files efficiently
> -l, --links copy symlinks as symlinks
> -L, --copy-links copy the referent of all symlinks
> --copy-unsafe-links copy the referent of "unsafe" symlinks
> --safe-links ignore "unsafe" symlinks
> It's possible that these can also be dealt with easily, but I'm not so
> sure. Clearly -x influences what gets scanned, so how do you decide
> what to cache? The other options are probably easier to deal with.
> 3. File checksums:
> -c, --checksum always checksum
> Should the caching operation always checksum so that the checksums are
> readily available when a client sets -c? This can lead to a lot of
> computations and disk IO which may be unnecessary if the clients do not
> use this option.
> 4. Block checksums:
> -B, --block-size=SIZE checksum blocking size (default 700)
> It would be great if we could cache the rolling block checksums as they
> are computed but this may be even harder (or impossible) to deal with.
> And it looks like soon we'll have a new checksum-seed option which will
> further complicate the issue (in fact I admit I have no idea about how
> all of this works beyond versions 2.5.x; maybe somebody with more
> knowledge on the subject will chime in).
In fact, the checksum-seed option is critical to any scheme that
caches the file list (with -C) or caches the block checksums.
Without the checksum-seed option you will get a different checksum
seed each time you run rsync more than a second apart (since the
checksum seed defaults to time()). This means the whole-file and
block checksums change every time. This is the reason the batch
mode options force the checksum seed to a fixed value.
More information about the rsync