batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

Craig Barratt cbarratt at users.sourceforge.net
Thu May 20 04:38:09 GMT 2004


Alberto Accomazzi writes:

> What I'm referring to are those options that a client passes to the 
> server which influence file selection, checksum and block generation.  I 
> haven't looked at the rsync source code in quite a while, but off the 
> top of my head here are the issues to look at when considering caching a 
> filesystem scan:
> 
> 1. Exclude/include patterns:
>   -C, --cvs-exclude           auto ignore files in the same way CVS does
>       --exclude=PATTERN       exclude files matching PATTERN
>       --exclude-from=FILE     exclude patterns listed in FILE
>       --include=PATTERN       don't exclude files matching PATTERN
>       --include-from=FILE     don't exclude patterns listed in FILE
>       --files-from=FILE       read FILE for list of source-file names
> 
> These should be easy to deal with: I would simply have the cache creator 
> ignore any --exclude options passed by the client (but probably honor 
> the ones defined in a daemon config file).
> 
> 2. Other file selection options:
>   -x, --one-file-system       don't cross filesystem boundaries
>   -S, --sparse                handle sparse files efficiently
>   -l, --links                 copy symlinks as symlinks
>   -L, --copy-links            copy the referent of all symlinks
>       --copy-unsafe-links     copy the referent of "unsafe" symlinks
>       --safe-links            ignore "unsafe" symlinks
> 
> It's possible that these can also be dealt with easily, but I'm not so 
> sure.  Clearly -x influences what gets scanned, so how do you decide 
> what to cache?  The other options are probably easier to deal with.
> 
> 3. File checksums:
>   -c, --checksum              always checksum
> 
> Should the caching operation always checksum so that the checksums are 
> readily available when a client sets -c?  This can lead to a lot of 
> computations and disk IO which may be unnecessary if the clients do not 
> use this option.
> 
> 4. Block checksums:
>   -B, --block-size=SIZE       checksum blocking size (default 700)
> 
> It would be great if we could cache the rolling block checksums as they 
> are computed but this may be even harder (or impossible) to deal with. 
> And it looks like soon we'll have a new checksum-seed option which will 
> further complicate the issue (in fact I admit I have no idea about how 
> all of this works beyond versions 2.5.x; maybe somebody with more 
> knowledge on the subject will chime in).

In fact, the checksum-seed option is critical to any scheme that
caches the file list (with -C) or caches the block checksums.
Without the checksum-seed option you will get a different checksum
seed each time you run rsync more than a second apart (since the
checksum seed defaults to time()).  This means the whole-file and
block checksums change every time.  This is the reason the batch
mode options force the checksum seed to a fixed value.

Craig


More information about the rsync mailing list