rsync backup performance question
Ron Arts
raarts at netland.nl
Sun Jun 22 21:59:11 EST 2003
jw schultz wrote:
> On Sun, Jun 22, 2003 at 11:42:46AM +0200, Ron Arts wrote:
>
>>Dear all,
>>
>>I am implementing a backup system, where thousands of postgreSQL
>>databases (max 1 Gb in size) on as much clients need to be backed
>>up nightly across ISDN lines.
>>
>>Because of the limited bandwidth, rsync is the prime candidate of
>>course.
>
>
> Only if you are updating an existing file on the backup
> server with sufficient commonality from one version to the
> next. pg_dump --format=t would is good. Avoid the built-in
> compression in pg_dump as it defeats rsync.
Restore time is significant, so I think I need a straight mirror
of the database files on the client. I think importing
a multi gigabyte SQL dump will take too long for us (one hour
is the limit). Have not tried that yet on postgreSQL though.
> gzip with the
> rsyncable patch and bzip2 are OK if you must compress.
>
So unpatched bzip2 is ok? nice to know..
Maybe I can tar an LVM snapshot, and bzip2 that
before rsyncing. Thanks for that one.
> The other issue is individual file size. Rsync versions
> prior to what is in CVS start having some performance issues
> with files larger than the 200-500MB range.
>
I'll keep that in mind.
>
>>Potential problems I see are server load (I/O and CPU), and filesystem
>>limits.
>
>
> Most of the load is on the sender. Over ISDN even with
> rsync compressing the datastream no one update should be CPU
> or I/O issue. The issue is scheduling so you don't have too
> many running simultaneously.
>
As I understand the algorithm, the server creates a list of checksums
(which is around 1% size of the original file), which is not really
CPU intensive, sends that to the client, and then the client does a lot
of work finding blocks that are the same as the server file.
So the server at least reads every file completely that is in the
rsync tree am i correct? In my case that means a lots of disk I/O,
given the total size for all databases (multiple TB's).
Please correct me if I'm wrong.
> The easiest way to manage the scheduling is to have the
> server pull. If that isn't possible then you will need to
> use an rsync wrapper that keeps the simultaneous runs within
> limits or put a good deal of smarts into the clients.
>
Yeah, pulling is out of the question, because the server can't
activate the ISDN link. The clients' rsync start time will need
to be hashed across the night.
>
>>Does anyone have experience with such setups?
>
>
> Unlikely on that scale over that sort of link.
>
> I'd suggest experimenting with -v and the --stats options turned on.
>
I will, thanks.
Ron
--
Netland Internet Services
bedrijfsmatige internetoplossingen
http://www.netland.nl Kruislaan 419 1098 VA Amsterdam
info: 020-5628282 servicedesk: 020-5628280 fax: 020-5628281
Useless Invention: Leather cutlery.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3465 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.samba.org/archive/rsync/attachments/20030622/b87ef724/smime.bin
More information about the rsync
mailing list