rsync freezes when copying several million files

Robert Siemer Robert.Siemer at backsla.sh
Tue Aug 22 09:13:45 GMT 2006


> >I'm using Dirvish to backup my servers. For those who haven't heard
> >about it it's sort of "front end" for rsync making a copy of the files
> >for the first backup, and then copying modified files and hardlinking
> >unmodified files every day.

dirvish is using the services of rsync to let the hardlinks get done.
Not much of dirvish's tribute.

> >I have 600 000 files, and I'm keeping 120 backups on disk. Files are not
> >changing too much, I have something like 120 GB of data and 180 GB of
> >backup.
> >That is roughly 600 000 files with 100 hardlinks on each, so 60 000 000
> >files.

That are 600 000 files and 600 000 hardlinks to consider. The 119 older
backups don't even get a look.

> >By the way what is the memory usage on the receiver side ?
> >For the first rsync there is no files receiver side, so it's no problem.
> >But after that first rsync memory usage will be the same on sender and
> >receiver side, as they'll both have to create their file list. Am I right ?
> 
> You could mount the other side (NFS, iSCSI etc., but that depends on the 
> connection you have), so that there would be only one rsync instance 
> running?

Mounting the remote side local contradicts a little the use of rsync. If
rsync decides to compare a file, it _reads_ it to get it's checksums.
The remote rsync instance would send only the checksums over the
network. In a network mounted scenario the whole file is transfered by
the filesystem.

The only advantage that stays is when rsync dicides to assume the files
equal (same size, modification time, etc).


Bye,
	Robert


More information about the rsync mailing list