Question about rsync and BIG mirror

Jan-Benedict Glaw jbglaw at lug-owl.de
Fri Mar 3 09:54:27 GMT 2006


On Fri, 2006-03-03 08:02:55 +0100, johan.boye at latecoere.fr <johan.boye at latecoere.fr> wrote:
> // I wonder if this message has been posted, so I sent it again //

It was, but nobody answered yet.

>   I'm preparing a plan for a production mode in my company: we need to
> mirror around 100GB of data trough a special VPN internet line 2MB
> symmetric.
>   The first time, the data will be transferred by a media such as a HD.
> Next, each night, we will try to update clients from the master server.

Does every client have 2MBit? ...or only the server's machine?

> It should be around 500MB to 3GB, no so much in comparison of the
> original size of data. 
>   I discovered "rsync" use a lot of CPU and RAM to run "checksums" on
> file that have to be synchronised. I need an opinion about my situation:

Right. Rsync trades (especially) CPU cycles and some RAM for network
bandwidth.

>   So: each night, from 0:00am to maximum 7:00am, the server will have to
> check the 100Go of files and see what files have been modified, then,
> upload them to the clients. Each file is around 4MB to 40MB in average. 

Are these new files, or do the old ones change? Are that minimal
changes within those files, or do they change throughoutly?

> I would like to know your opinion about this situation:  
>  - Should I setup a strong dual CPU computer dedicated to calculate this
> whole stuff? 

A lot of CPU power cannot hurt.

>  - What about the memory I should install? 

Depends on the number of clients.

Rule of thumb: "RAM can only be substituted by even more RAM."

>  - Is there any bandwidth used during the checksums computation? Mine is
> quite limited.

Checksum calculation basically happens on the server side as well as
on the client side; this part doesn't really use bandwidth.

>  - I know the client computer will have to check files too; Disk I/O
> will be the most used. I think this computer will have NFS mount from a
> "datacenter" computer with a GB LAN card, I wonder it will be enough...

Is it a two-computer-sync or one master machines with a hugh number of
clients? However, both sides may need to touch all the file data...

>   I'm quite scared of the amount of data to check before synchronise
> clients, and how long it will take. To finish shortly, what do YOU
> think? Any advices?

That all depends on the usage pattern. So you've got one central rsync
server and a number (how many?) of clients that need to synchronize.
All these do have 2Mbit connectivity, right?

You'd also have to define the way your files change. Do they change by
name? By content? If by content, how much does change within the
files?

See, it's all about the details :-)

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw at lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 für einen Freien Staat voll Freier Bürger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/rsync/attachments/20060303/7200e284/attachment.bin


More information about the rsync mailing list