how to migrate 40T data and 180M files

Brett Worth brett at worth.id.au
Tue Aug 11 04:36:24 MDT 2009


Ming Gao wrote:
> I need to migrate 40T data and 180M files from one storage device to
> another one, both source and destination will be NFS and mounted to a
> local suse linux box.

Is there any way you could get local access to the write end of the transfer so that you
don't have to do this all via NFS?   The NFS write performance might cause issues.

Personally I'd probably not use rsync for this transfer.  The startup time is going to
kill you.  If the directory structure is agreeable then you could split the problem down
into subtrees.

Let me guess.  You're going from one NetApp box to another NetApp box?  Or some other NAS
to NAS so you will be unable to get local access to either end?   Have you done some
testing of the create/sec and large file bandwidth you'll be able to get?

Any way you look at it this is going to take a long time.  If you could sustain 80MB/s
across a gig-e link, which is pretty high for NFS writes, then you're looking at 6 days
minimum.  If the directories have a high file count then you could be looking at high open
times for each file so that 6 days could blow out.

> The first question is that if there is any risk for such a big number of
> files? should I divide them into groups and rsync them in parallel or in
> serial? If yes, how many groups is better?

If you only have one path between the source and target then I'd try to go for about 3
copy threads.  This number can only really be determined through trial and error.

> And any other thing I could do to reduce the risk?

Once you get all the file copied over by whatever means then a final rsync would be good
to get all the metadata lined up.  Based on your file count I'd strongly recommend you
break up  the filesystem into smaller problems.


Brett


More information about the rsync mailing list