how to migrate 40T data and 180M files
hramrach at centrum.cz
Tue Aug 11 03:53:31 MDT 2009
2009/8/11 Ming Gao <gaomingcn at gmail.com>:
> It's almost the same? I ever tested on about 7G data, I rsync'ed it to
> another directory, and it takes less than 1 minute when I run the same
> command line again.
Did you test it on the two NFS shares or something else?
Also if you have enough memory part of the data might remain cached
and speed up subsequent transfers.
> The reason why I use rsync is that the data will change during the time I
> run rsync the first time. Then I need to run rsync the second time to make
> them the same.
> How long would it take if the two copies are the same? I mean just verify if
> they are the same.
If both source and destination are NFS mounted and they are on
reasonably fast drive array then the bottleneck is the network.
Reading src & dest and comparing them is about as fast as reading src
and writing dest because the whole data gets through the network twice
in either case. The latter is probably faster because the system
simply moves frames between ethernet card buffers without doing much
else, comparing may get quite CPU intensive and slow the process down.
The advantage of rsync comes when you have disks attached directly and
the network link is slow - the checksums can be computed locally and
only the differences transferred.
You would have to run rsync on the two NFS servers for it to help, and
it only helps if the disk speed (and computation speed) is
substantially faster than the network transfer speed.
More information about the rsync