rsync 1tb+ each day
kgorman at paypal.com
Thu Feb 6 05:53:25 EST 2003
Eric Whiting wrote:
> I've learned some good things from this discussion. THanks.
> Kenny, I have one concern/idea -- The original post says the 'disk is
> fairly slow'. That is one bottleneck that should probably be examined a
> little more. How fast are your disks? HOw fast is your network? An IDE
> disk with DMA disabled might run 5M/s and when you enable DMA you can
> see up to 45M/s. Perhaps this is a root cause problem that has already
> been looked at, but perhaps it would be good to look at it again. Also
> do you have enough RAM on the destination to do some caching of the file
> for the multiple reads of the file? That might also help.
The disks on the destination side are a stripe on a sun 5200. not sure
of the underlaying disk size, etc at this point. On top of that we have
the VxFS file system with caching enabled. The box has 12gb ram. It
does indeed cache the second read of the file, so thats good.
We are going to speed up the disks, and try this again very soon (it's
going to take a while for us to get the 2tb file system rebuilt ;-)
thx for the tips!
> jw schultz wrote:
>>On Tue, Feb 04, 2003 at 11:29:48AM -0800, Kenny Gorman wrote:
>>>I am rsyncing 1tb of data each day. I am finding in my testing that
>>>actually removing the target files each day then rsyncing is faster than
>>>doing a compare of the source->target files then rsyncing over the delta
>>>blocks. This is because we have a fast link between the two boxes, and
>>>that are disk is fairly slow. I am finding that the creation of the temp
>>>file (the 'dot file') is actually the slowest part of the operation.
>>>This has to be done for each file because the timestamp and at least a
>>>couple blocks are guaranteed to have changed (oracle files).
>>As others have mentioned -W (--whole-file) will help here.
>>The reason the temp-file is so slow is that it is reading
>>blocks from the disk and writing them to other blocks on the
>>same disk. This means every block that is unchanged must be
>>transfered twice over the interface where changed blocks are
>>only transfered once. If the files are very large this is
>>guaranteed to cause a seek storm.
>>Further, all of this happens after the entire file has been
>>read once to generate the block checksums. Unless your
>>tree is smallish reads from the checksum pass will have been
>>flushed from cache by the time you do the final transfer.
>>--whole-file elminiates most of the disk activity. You no
>>longer do the block checksum pass and replace the local copying
>>(read+write) with a simple write from the network.
>>Most likely your network is faster than the disks. For
>>files that change but change very little your disk subsystem
>>would have to be more than triple the speed of your network
>>for the rsync algorythm (as oposed to the utility) to be of
>>benefit. If the files change a lot then you merely need
>>double the speed.
>> J.W. Schultz Pegasystems Technologies
>> email address: jw at pegasys.ws
>> Remember Cernan and Schmitt
>>To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
>>Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
More information about the rsync