Best organizing hundreds of thousands files for rsync and find

Stier, Matthew Matthew.Stier at us.fujitsu.com
Thu Mar 28 05:30:14 MDT 2013


Another issue is that rsync and find are single threaded applications.  No matter how many processors/cores/threads the system has, each invocation of find or rsync will use only one thread.

You can gain some parallelization by stepping up a level in the directory and running running find's or rsync's at the first subdirectory level.  I do this when transferring files between systems over a modern gigabit LAN.

From: rsync-bounces at lists.samba.org [mailto:rsync-bounces at lists.samba.org] On Behalf Of Cristian Bichis
Sent: Thursday, March 28, 2013 1:31 AM
To: rsync at lists.samba.org
Subject: Best organizing hundreds of thousands files for rsync and find

Hi,

I need to organize about 100 millions small files (and the number grows up) on a server which should be copied to other server.

I am wondering how many files are recommended to be kept into a folder for optimal performance? As well, if I have a folder with only subfolders (not files) what number of subfolders are recommended to have?

As well, the question could be for "find" command, not just for for rsync as I am doing some cleanups using find (or for - find).


I made a mistake before and I increased a lot the number of subfoldersfolders (having just few files within them) and rsync performance was decreasing considerably. Was a mistake which I will try to correct.

So now as the number of files is increasing constantly I need to find out a solution on long term to correct the current issues.

Cristian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20130328/824350a5/attachment.html>


More information about the rsync mailing list