Very large quantity of files
dwd at bell-labs.com
Wed Jan 9 02:23:39 EST 2002
On Tue, Jan 08, 2002 at 10:15:14AM +0100, sunrise at t-online.de wrote:
> to explain:
> I have two machines running the same hard- and software.
> Each has two harddrives 80GB/40GB with 500 megs of RAM and
> a 650MHz PIII, running SuSE Linux 7.1 with Kernel 2.2.18.
> They are connected on a local 100Mbps Ethernet.
> The harddrives are pretty full (total ~94GB) with a very
> large quantity of small files. The initial copy has taken about
> 48 hours. - I didn't worry much about that. -
> So, know rsync shall run every night and I have a capacity of
> 5 hours to do the job. I have started it yesterday the first time
> and it needs 14h to do the job and there were not much changes.
> I've checked a number of changes and it looks like rsync has done
> the job pretty good -- if I dont look to the clock :(
> I am using rsync-2.5.1 with the follwing command:
> rsync -av --delete /mnt/data /data_bak
> The two drives are mounted using nfs. I run a simple script which
> calls rsync for each of the two drives.
> Is there anything wrong?
> Would you prever another way for that job?
> I'm new to rsync and I have started reading previous posts, but
> found nothing that really helps me.
> Thanks a lot in advance for your help.
First, try it without using NFS; run the two halves of rsync on the two
different machines. Rsync is optimized for local access to files.
Because you have such a high speed LAN and mostly small files, after you
have moved the other half of rsync to the other machine, I think you'll
probably be better off adding the --whole-file option which disables the
rsync rolling checksum algorithm. That mode is definitely needed when
you're going over NFS, but it's the default in rsync 2.5.1 when you're
doing a local copy as far as rsync is concerned, which is what you were
doing because rsync doesn't realize NFS is in there.
Finally, if you have a huge amount of files in a single rsync run you may
have been swapping on the machine you ran the copy on. Rsync uses some
memory for every file it visits in a single run. You may be better off
breaking it up into a number of smaller runs.
- Dave Dykstra
More information about the rsync