patch to enable faster mirroring of large filesystems
dwd at bell-labs.com
Tue Nov 20 08:34:59 EST 2001
Before I look at this closely, I have a couple questions.
First, what options do you use to copy? I once saw somebody who went through
a lot of work to cache things and it turned out to be just because he
was using the --checksum option when he shouldn't have.
Next, have you taken a look at the 2.4.7 pre-release? It includes a couple
options --write-batch and --read-batch which may do want you want, I'm not
sure. I think the intention of that, though, was more to be able to
efficiently do copies to multiple targets, not to speed up a single copy.
You can get the latest development version of rsync by doing
rsync -a rsync://rsync.samba.org/ftp/unpacked/rsync .
- Dave Dykstra
On Mon, Nov 19, 2001 at 04:06:45PM -0500, Andrew J. Schorr wrote:
> I have attached a patch that adds 4 options to rsync that have helped
> me to speed up my mirroring. I hope this is useful to someone else,
> but I fear that my relative inexperience with rsync has caused me to
> miss a way to do what I want without having to patch the code. So please
> let me know if I'm all wet.
> Here's my story: I have a large filesystem (around 20 gigabytes of data)
> that I'm mirroring over a T1 link to a backup site. Each night,
> about 600 megabytes of data needs to be transferred to the backup site.
> Much of this data has been appended to the end of various existing files,
> so a tool like rsync that sends partial updates instead of the whole
> file is appropriate.
> Normally, one could just use rsync with the --recursive and --delete features
> to do this. However, this takes a lot more time than necessary, basically
> because rsync spends a lot of time walking through the directory tree
> (which contains over 300,000 files).
> One can speed this up by caching a listing of the directory tree. I maintain
> an additional state file at the backup site that contains a listing
> of the state of the tree after the last backup operation. This is essentially
> equivalent to saving the output of "find . -ls" in a file.
> Then, the next night, one generates the updated directory tree for the source
> file system and does a diff with the directory listing on the backup file
> system to find out what has changed. This seems to be much faster than
> using rsync's recursive and delete features.
> I have my own script and programs to delete any files that have been removed,
> and then I just need to update the files that have been added or changed.
> One could use cpio for this, but it's too slow when only partial files
> have changed.
> So I added the following options to rsync:
> --source-list SRC arg will be a (local) file name containing a list of files, or - to read file names from stdin
> --null used with --source-list to indicate that the file names will be separated by null (zero) bytes instead of linefeed characters; useful with gfind -print0
> --send-dirs send directory entries even though not in recursive mode
> --no-implicit-dirs do not send implicit directories (parents of the file being sent)
> The --source-list option allows me to supply an explicit list of filenames
> to transport without using the --recursive feature and without playing
> around with include and exclude files. I'm not really clear on whether
> the include and exclude files could have gotten me the same place, but it
> seems to me that they work hand-in-hand with the --recursive feature that
> I don't want to use.
> The --null flag allows me to handle files with embedded linefeeds. This
> is in the style of gnu find's -print0 operator.
> The --send-dirs overcomes a problem where rsync refuses to send directories
> unless it's in recursive mode. One needs this to make sure that even
> empty directories get mirrored.
> And the --no-implicit-dirs option turns off the default behavior in which
> all the parent directories of a file are transmitted before sending the
> file. That default behavior is very inefficient in my scenario where I
> am taking the responsibility for sending those directories myself.
> So, the patch is attached. If you think it's an abomination, please let
> me know what the better solution is. If you would like some elaboration
> on how this stuff really works, please let me know.
More information about the rsync