Rsync: Re: patch to enable faster mirroring of large filesystems

Lachlan Cranswick l.m.d.cranswick at dl.ac.uk
Tue Nov 20 22:45:44 EST 2001


Is there any chance this can be added into the distribution as it sounds
really nifty.

Another suggestion unless I have read the following - would it be
useful to have a command option in rsync to generate the file list
by doing the "find" and outputting into a standard format?
 (As this would make it less OS specific or kludgy?)

Cheers,

Lachlan.

At 16:06 19/11/01 -0500, you wrote:
>I have attached a patch that adds 4 options to rsync that have helped
>me to speed up my mirroring.  I hope this is useful to someone else,
>but I fear that my relative inexperience with rsync has caused me to
>miss a way to do what I want without having to patch the code.  So please
>let me know if I'm all wet.
>
>Here's my story: I have a large filesystem (around 20 gigabytes of data)
>that I'm mirroring over a T1 link to a backup site.  Each night, 
>about 600 megabytes of data needs to be transferred to the backup site.
>Much of this data has been appended to the end of various existing files,
>so a tool like rsync that sends partial updates instead of the whole
>file is appropriate.
>
>Normally, one could just use rsync with the --recursive and --delete features
>to do this.  However, this takes a lot more time than necessary, basically
>because rsync spends a lot of time walking through the directory tree
>(which contains over 300,000 files).
>
>One can speed this up by caching a listing of the directory tree.  I maintain
>an additional state file at the backup site that contains a listing
>of the state of the tree after the last backup operation.  This is essentially
>equivalent to saving the output of "find . -ls" in a file.
>
>Then, the next night, one generates the updated directory tree for the source
>file system and does a diff with the directory listing on the backup file
>system to find out what has changed.  This seems to be much faster than
>using rsync's recursive and delete features.
>
>I have my own script and programs to delete any files that have been removed,
>and then I just need to update the files that have been added or changed.
>One could use cpio for this, but it's too slow when only partial files
>have changed.
>
>So I added the following options to rsync:
>
>     --source-list           SRC arg will be a (local) file name containing
a list of files, or - to read file names from stdin
>     --null                  used with --source-list to indicate that the
file names will be separated by null (zero) bytes instead of linefeed
characters; useful with gfind -print0
>     --send-dirs             send directory entries even though not in
recursive mode
>     --no-implicit-dirs      do not send implicit directories (parents of
the file being sent)
>
>The --source-list option allows me to supply an explicit list of filenames
>to transport without using the --recursive feature and without playing
>around with include and exclude files.  I'm not really clear on whether
>the include and exclude files could have gotten me the same place, but it
>seems to me that they work hand-in-hand with the --recursive feature that
>I don't want to use.
>
>The --null flag allows me to handle files with embedded linefeeds.  This
>is in the style of gnu find's -print0 operator.
>
>The --send-dirs overcomes a problem where rsync refuses to send directories
>unless it's in recursive mode.  One needs this to make sure that even
>empty directories get mirrored.
>
>And the --no-implicit-dirs option turns off the default behavior in which
>all the parent directories of a file are transmitted before sending the
>file.  That default behavior is very inefficient in my scenario where I
>am taking the responsibility for sending those directories myself.
>
>So, the patch is attached.  If you think it's an abomination, please let
>me know what the better solution is.  If you would like some elaboration
>on how this stuff really works, please let me know.
>
>Cheers,
>Andy
>
>Attachment Converted: C:\Eudora\Attach\rsync-2.4.6-srclist.patch
>

-----------------------
Lachlan M. D. Cranswick
Collaborative Computational Project No 14 (CCP14)
    for Single Crystal and Powder Diffraction
  Birkbeck University of London and Daresbury Laboratory 
Postal Address: CCP14 - School of Crystallography,
                Birkbeck College,
                Malet Street, Bloomsbury,
                WC1E 7HX, London,  UK
Tel: (+44) 020 7631 6849   Fax: (+44) 020 7631 6803
E-mail: l.m.d.cranswick at dl.ac.uk
WWW: http://www.ccp14.ac.uk/





More information about the rsync mailing list