Rsync: Re: patch to enable faster mirroring of large
filesystems
Lachlan Cranswick
l.m.d.cranswick at dl.ac.uk
Tue Nov 20 22:45:44 EST 2001
Is there any chance this can be added into the distribution as it sounds
really nifty.
Another suggestion unless I have read the following - would it be
useful to have a command option in rsync to generate the file list
by doing the "find" and outputting into a standard format?
(As this would make it less OS specific or kludgy?)
Cheers,
Lachlan.
At 16:06 19/11/01 -0500, you wrote:
>I have attached a patch that adds 4 options to rsync that have helped
>me to speed up my mirroring. I hope this is useful to someone else,
>but I fear that my relative inexperience with rsync has caused me to
>miss a way to do what I want without having to patch the code. So please
>let me know if I'm all wet.
>
>Here's my story: I have a large filesystem (around 20 gigabytes of data)
>that I'm mirroring over a T1 link to a backup site. Each night,
>about 600 megabytes of data needs to be transferred to the backup site.
>Much of this data has been appended to the end of various existing files,
>so a tool like rsync that sends partial updates instead of the whole
>file is appropriate.
>
>Normally, one could just use rsync with the --recursive and --delete features
>to do this. However, this takes a lot more time than necessary, basically
>because rsync spends a lot of time walking through the directory tree
>(which contains over 300,000 files).
>
>One can speed this up by caching a listing of the directory tree. I maintain
>an additional state file at the backup site that contains a listing
>of the state of the tree after the last backup operation. This is essentially
>equivalent to saving the output of "find . -ls" in a file.
>
>Then, the next night, one generates the updated directory tree for the source
>file system and does a diff with the directory listing on the backup file
>system to find out what has changed. This seems to be much faster than
>using rsync's recursive and delete features.
>
>I have my own script and programs to delete any files that have been removed,
>and then I just need to update the files that have been added or changed.
>One could use cpio for this, but it's too slow when only partial files
>have changed.
>
>So I added the following options to rsync:
>
> --source-list SRC arg will be a (local) file name containing
a list of files, or - to read file names from stdin
> --null used with --source-list to indicate that the
file names will be separated by null (zero) bytes instead of linefeed
characters; useful with gfind -print0
> --send-dirs send directory entries even though not in
recursive mode
> --no-implicit-dirs do not send implicit directories (parents of
the file being sent)
>
>The --source-list option allows me to supply an explicit list of filenames
>to transport without using the --recursive feature and without playing
>around with include and exclude files. I'm not really clear on whether
>the include and exclude files could have gotten me the same place, but it
>seems to me that they work hand-in-hand with the --recursive feature that
>I don't want to use.
>
>The --null flag allows me to handle files with embedded linefeeds. This
>is in the style of gnu find's -print0 operator.
>
>The --send-dirs overcomes a problem where rsync refuses to send directories
>unless it's in recursive mode. One needs this to make sure that even
>empty directories get mirrored.
>
>And the --no-implicit-dirs option turns off the default behavior in which
>all the parent directories of a file are transmitted before sending the
>file. That default behavior is very inefficient in my scenario where I
>am taking the responsibility for sending those directories myself.
>
>So, the patch is attached. If you think it's an abomination, please let
>me know what the better solution is. If you would like some elaboration
>on how this stuff really works, please let me know.
>
>Cheers,
>Andy
>
>Attachment Converted: C:\Eudora\Attach\rsync-2.4.6-srclist.patch
>
-----------------------
Lachlan M. D. Cranswick
Collaborative Computational Project No 14 (CCP14)
for Single Crystal and Powder Diffraction
Birkbeck University of London and Daresbury Laboratory
Postal Address: CCP14 - School of Crystallography,
Birkbeck College,
Malet Street, Bloomsbury,
WC1E 7HX, London, UK
Tel: (+44) 020 7631 6849 Fax: (+44) 020 7631 6803
E-mail: l.m.d.cranswick at dl.ac.uk
WWW: http://www.ccp14.ac.uk/
More information about the rsync
mailing list