rsync backup performance question
raarts at netland.nl
Sun Jun 22 21:59:11 EST 2003
jw schultz wrote:
> On Sun, Jun 22, 2003 at 11:42:46AM +0200, Ron Arts wrote:
>>I am implementing a backup system, where thousands of postgreSQL
>>databases (max 1 Gb in size) on as much clients need to be backed
>>up nightly across ISDN lines.
>>Because of the limited bandwidth, rsync is the prime candidate of
> Only if you are updating an existing file on the backup
> server with sufficient commonality from one version to the
> next. pg_dump --format=t would is good. Avoid the built-in
> compression in pg_dump as it defeats rsync.
Restore time is significant, so I think I need a straight mirror
of the database files on the client. I think importing
a multi gigabyte SQL dump will take too long for us (one hour
is the limit). Have not tried that yet on postgreSQL though.
> gzip with the
> rsyncable patch and bzip2 are OK if you must compress.
So unpatched bzip2 is ok? nice to know..
Maybe I can tar an LVM snapshot, and bzip2 that
before rsyncing. Thanks for that one.
> The other issue is individual file size. Rsync versions
> prior to what is in CVS start having some performance issues
> with files larger than the 200-500MB range.
I'll keep that in mind.
>>Potential problems I see are server load (I/O and CPU), and filesystem
> Most of the load is on the sender. Over ISDN even with
> rsync compressing the datastream no one update should be CPU
> or I/O issue. The issue is scheduling so you don't have too
> many running simultaneously.
As I understand the algorithm, the server creates a list of checksums
(which is around 1% size of the original file), which is not really
CPU intensive, sends that to the client, and then the client does a lot
of work finding blocks that are the same as the server file.
So the server at least reads every file completely that is in the
rsync tree am i correct? In my case that means a lots of disk I/O,
given the total size for all databases (multiple TB's).
Please correct me if I'm wrong.
> The easiest way to manage the scheduling is to have the
> server pull. If that isn't possible then you will need to
> use an rsync wrapper that keeps the simultaneous runs within
> limits or put a good deal of smarts into the clients.
Yeah, pulling is out of the question, because the server can't
activate the ISDN link. The clients' rsync start time will need
to be hashed across the night.
>>Does anyone have experience with such setups?
> Unlikely on that scale over that sort of link.
> I'd suggest experimenting with -v and the --stats options turned on.
I will, thanks.
Netland Internet Services
http://www.netland.nl Kruislaan 419 1098 VA Amsterdam
info: 020-5628282 servicedesk: 020-5628280 fax: 020-5628281
Useless Invention: Leather cutlery.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3465 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.samba.org/archive/rsync/attachments/20030622/b87ef724/smime.bin
More information about the rsync