rsync backup performance question

Mon Jun 23 00:20:34 EST 2003

jw schultz wrote:
> 
> You have a couple of points wrong.  The receiver generates
> the block checksums.  If you are pushing that would be the
> server but if you are pulling it is the client.  In 2.5.6
> and earlier the transmitted block checksums are 6 bytes per
> block with a default block size of 700 bytes so just under
> 1% of file size.  Unless you have a slow CPU the block
> checksum generation will be I/O bound.
> 
> The only files that are opened are those where metadata
> indicate the contents are changed.  In those cases you do
> have a lot of disk i/o.  For database backups that will
> probably amount to every file.
> 
> The sender only does one read pass on each changed file.
> The receiver does a read pass for the blocksums and later
> reads again the unchanged (possibly relocated) blocks as it
> merges them with the changed data to write the new file.
> Several files may be in process at any given time.  The
> cache capacity of the receiver has a significant impact on
> performance.
> 

Would it be feasible to have a separate process pre-creating
blocksums during the day in separate files (ending in ",rsync")?
Or, for example, while writing the changed file, the receiver
would precompute and save the blocksums, for using it on
the next run? This would save at least half my I/O.

> 
>>>The easiest way to manage the scheduling is to have the
>>>server pull.  If that isn't possible then you will need to
>>>use an rsync wrapper that keeps the simultaneous runs within
>>>limits or put a good deal of smarts into the clients.
>>>
>>
>>Yeah, pulling is out of the question, because the server can't
>>activate the ISDN link. The clients' rsync start time will need
>>to be hashed across the night.
> 
> 
> I'd favour a wrapper over depending on hashing the start
> times.  An alternate approach might be to have the clients
> open the connection with port forwarding, write a queue file
> and wait for a completion indicator before closing the
> connection.  The server could then pull using on the queue
> files to identify waiting clients.  While a bit more
> complicated it avoids the temporal gaps caused by the
> fallback-sleep-retry of the wrappers.
> 

What do you mean by a wrapper? something that connects,
check if the server has some resources, and try again later?
Does it already exist?

This might incur ISDN call-setup costs that might be
unacceptable. Same thing with keep-line-open-until-server-pulls.
But on the other hand, this will maximize server performance.

On the other hand, I will probably need to spread the load
across multiple servers anyway, so maybe something like the
linux virtual server project would come in handy.... have
to look into that too.

> The last thing you want is to thrash the server or cause an
> OOM condition.  If at all possible you will want to avoid
> paging on the server.  The instant you start thrashing
> filesystem cache performance will shrivel.
> 

Definitely.

Ron

-- 
Netland Internet Services
bedrijfsmatige internetoplossingen

http://www.netland.nl   Kruislaan 419              1098 VA Amsterdam
info: 020-5628282       servicedesk: 020-5628280   fax: 020-5628281

One way to better your lot is to do a lot better...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3465 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.samba.org/archive/rsync/attachments/20030622/bf1ad9f7/smime.bin