TR: Question about rsync and BIG mirror
johan.boye at latecoere.fr
johan.boye at latecoere.fr
Mon Mar 6 08:36:15 GMT 2006
> -----Message d'origine-----
> De : BOYE Johan
> Envoyé : lundi 6 mars 2006 08:28
> À : 'Jan-Benedict Glaw'
> Objet : RE: Question about rsync and BIG mirror
> > I'm preparing a plan for a production mode in my company: we need to
> > mirror around 100GB of data trough a special VPN internet line 2MB
> > symmetric.
> > The first time, the data will be transferred by a media such as a HD.
> > Next, each night, we will try to update clients from the master server.
> Does every client have 2MBit? ...or only the server's machine?
Well, the server got a 2 Mb/sec = 250KB/sec connection to each clients.
For the moment, we will have 1 client, it will go up to 3 clients.
> > It should be around 500MB to 3GB, no so much in comparison of the
> > original size of data.
> > I discovered "rsync" use a lot of CPU and RAM to run "checksums" on
> > file that have to be synchronised. I need an opinion about my situation:
> Right. Rsync trades (especially) CPU cycles and some RAM for network
> > So: each night, from 0:00am to maximum 7:00am, the server will have to
> > check the 100Go of files and see what files have been modified, then,
> > upload them to the clients. Each file is around 4MB to 40MB in average.
> Are these new files, or do the old ones change? Are that minimal
> changes within those files, or do they change throughoutly?
Most of the change will be changes on "CATIA" files, and maybe times to
times some new files. It seems Catia files (CAD) are ""compiled"", new
data are not added at the end of the file :-/
> > I would like to know your opinion about this situation:
> > - Should I setup a strong dual CPU computer dedicated to calculate this
> > whole stuff?
> A lot of CPU power cannot hurt.
> > - What about the memory I should install?
> Depends on the number of clients.
> Rule of thumb: "RAM can only be substituted by even more RAM."
> > - Is there any bandwidth used during the checksums computation? Mine is
> > quite limited.
> Checksum calculation basically happens on the server side as well as
> on the client side; this part doesn't really use bandwidth.
> > - I know the client computer will have to check files too; Disk I/O
> > will be the most used. I think this computer will have NFS mount from a
> > "datacenter" computer with a GB LAN card, I wonder it will be enough...
> Is it a two-computer-sync or one master machines with a hugh number of
> clients? However, both sides may need to touch all the file data...
> > I'm quite scared of the amount of data to check before synchronise
> > clients, and how long it will take. To finish shortly, what do YOU
> > think? Any advices?
> >That all depends on the usage pattern. So you've got one central rsync
> >server and a number (how many?) of clients that need to synchronize.
> >All these do have 2Mbit connectivity, right?
Yes, one server and 1 to 3 clients thought 2Mbits/sec dedicated
> You'd also have to define the way your files change. Do they change by
> name? By content? If by content, how much does change within the
As i said, 98% of the changed files will have modified content + different
> > See, it's all about the details :-)
Already thanks for all your information yet ;)
More information about the rsync