sync 54 million files, tuning rsync? (offlist)

Dirk Schenkewitz schenkewitz at docomolab-euro.com
Tue Feb 14 11:31:32 GMT 2006


Hi Jerry,

I reply offlist because I'm everything but an expert, so
you might get a better answer from someone else.
Anyway, perhaps it would be good if you post to the list 
what you're already doing.

On Tuesday, 14. February 2006 03:31, Jerry wrote:
> I'm trying to sync up 54 million files.  I can break
> it down into different applications, but I still have
> to accomplish 17 million files in one "chunk" if
> possible.

Uh oh :-)

> I'm using a fast (v440) cpu system with 32G ram. 
> Originally I was running out of memory (seemed to
> increase like crazy with --delete on).  I've upgraded
> to rsync 2.6.6 and the memory seems to be more stable
> now, but it's still not using much CPU to perform it's
> task, I have CPU to spare!  I'm trying to sync from
> one NFS mount to another NFS mount (Netapp).

Could you do that on one of the machines you have
NFS-mounted now? Although I'm not sure at all, I
*believe* that rsync will consider this situation as a
local-to-local copy and may optimise for that, but since
there are 2 copies over-a-network (one from 'source' to
'here' (where the CPU is that does all the work) and one
from 'here' to 'target'), this could be quite inefficient. If
you run the rsync either on the source machine or the
target machine, it MIGHT become about twice as fast,
because there is only one copy now. Further, rsync
might recognise the situation as remote-to-local or
local-to-remote copying and optimise for low network load.

> Is there anyway I can force rsync to use *more*
> resources?  My Netapp doesn't seem to be pounded and
> my V440 is only using 2%-8% of cpu, 27G of real memory
> free.  Am I doing something wrong?  It's going on 5
> hours now.  It's "considered" all the files and
> printed out a few it deleted, but that this point I'm
> not sure where it is at.

This block is something for the real experts. Only one thing:
You could perhaps use on-the-fly compression to use more
CPU and lower the network load.

Good Luck & Cheers
  Dirk


More information about the rsync mailing list