rsync 1tb+ each day

Eric Whiting ewhiting at amis.com
Wed Feb 5 13:29:24 EST 2003


I've learned some good things from this discussion. THanks.

Kenny, I have one concern/idea -- The original post says the 'disk is
fairly slow'. That is one bottleneck that should probably be examined a
little more. How fast are your disks? HOw fast is your network? An IDE
disk with DMA disabled might run 5M/s and when you enable DMA you can
see up to 45M/s. Perhaps this is a root cause problem that has already
been looked at, but perhaps it would be good to look at it again. Also
do you have enough RAM on the destination to do some caching of the file
for the multiple reads of the file? That might also help. 

eric



jw schultz wrote:
> 
> On Tue, Feb 04, 2003 at 11:29:48AM -0800, Kenny Gorman wrote:
> > I am rsyncing 1tb of data each day.  I am finding in my testing that
> > actually removing the target files each day then rsyncing is faster than
> > doing a compare of the source->target files then rsyncing over the delta
> > blocks.  This is because we have a fast link between the two boxes, and
> > that are disk is fairly slow. I am finding that the creation of the temp
> > file (the 'dot file') is actually the slowest part of the operation.
> > This has to be done for each file because the timestamp and at least a
> > couple blocks are guaranteed to have changed (oracle files).
> 
> As others have mentioned -W (--whole-file) will help here.
> 
> The reason the temp-file is so slow is that it is reading
> blocks from the disk and writing them to other blocks on the
> same disk.  This means every block that is unchanged must be
> transfered twice over the interface where changed blocks are
> only transfered once.  If the files are very large this is
> guaranteed to cause a seek storm.
> 
> Further, all of this happens after the entire file has been
> read once to generate the block checksums.  Unless your
> tree is smallish reads from the checksum pass will have been
> flushed from cache by the time you do the final transfer.
> --whole-file elminiates most of the disk activity.  You no
> longer do the block checksum pass and replace the local copying
> (read+write) with a simple write from the network.
> 
> Most likely your network is faster than the disks.  For
> files that change but change very little your disk subsystem
> would have to be more than triple the speed of your network
> for the rsync algorythm (as oposed to the utility) to be of
> benefit.  If the files change a lot then you merely need
> double the speed.
> 
> --
> ________________________________________________________________
>         J.W. Schultz            Pegasystems Technologies
>         email address:          jw at pegasys.ws
> 
>                 Remember Cernan and Schmitt
> --
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html


More information about the rsync mailing list