Very large quantity of files

Dave Dykstra dwd at bell-labs.com
Wed Jan 9 02:23:39 EST 2002


On Tue, Jan 08, 2002 at 10:15:14AM +0100, sunrise at t-online.de wrote:
> Hello,
> 
> to explain: 
> I have two machines running the same hard- and software. 
> Each has two harddrives 80GB/40GB with 500 megs of RAM and 
> a 650MHz PIII, running SuSE Linux 7.1 with Kernel 2.2.18.
> 
> They are connected on a local 100Mbps Ethernet.
> 
> The harddrives are pretty full (total ~94GB) with a very 
> large quantity of small files. The initial copy has taken about 
> 48 hours. - I didn't worry much about that. -
> 
> So, know rsync shall run every night and I have a capacity of
> 5 hours to do the job. I have started it yesterday the first time
> and it needs 14h to do the job and there were not much changes.
> I've checked a number of changes and it looks like rsync has done
> the job pretty good -- if I dont look to the clock :(
> 
> I am using rsync-2.5.1 with the follwing command:
> 
> rsync -av --delete /mnt/data /data_bak
> 
> The two drives are mounted using nfs. I run a simple script which 
> calls rsync for each of the two drives.
> 
> Is there anything wrong?
> Would you prever another way for that job?
> 
> I'm new to rsync and I have started reading previous posts, but
> found nothing that really helps me.
> 
> Thanks a lot in advance for your help.
> 
> Regards,
> Juergen


First, try it without using NFS; run the two halves of rsync on the two
different machines.  Rsync is optimized for local access to files.

Because you have such a high speed LAN and mostly small files, after you
have moved the other half of rsync to the other machine, I think you'll
probably be better off adding the --whole-file option which disables the
rsync rolling checksum algorithm.  That mode is definitely needed when
you're going over NFS, but it's the default in rsync 2.5.1 when you're
doing a local copy as far as rsync is concerned, which is what you were
doing because rsync doesn't realize NFS is in there.

Finally, if you have a huge amount of files in a single rsync run you may
have been swapping on the machine you ran the copy on.  Rsync uses some 
memory for every file it visits in a single run.  You may be better off
breaking it up into a number of smaller runs.

- Dave Dykstra




More information about the rsync mailing list