Rsync & parallelizing it for files in same directory
hvjunk at gmail.com
Fri Nov 28 11:40:26 GMT 2008
I haven't yet delved into the rsync source code, and thus would need
to ask a few questions first before going into the wrong direction.
We have not, one but several cases of doing Disaster Recovery type
backup/synchronization with typically 1 to 2 million of files.
Now yes, I know I can go and write a perl/python/<flavour_of_the_year>
script to recursively do the synchronizations in parallel, but it
still leaves issues with the directories with lots of medium sized
files, and not easy ways to split them up, unless....
Now the question I'm wondering about:
- Ease of doing a threading type setup inside rsync.
- Use threading architecture, to generate the checksums in parallel
(we have >24 cores per system around here....)
- rsync then opens multiple connections to the remote site (Bandwidth
optimization with the latencies involved)
- rsync then typically sends the files in a round robin fashion over
- or rsync does it directory per link for small directories.
We do have multicore systems coming out now as standard on laptops, so
a threading architecture is something to consider.
Note  and please don't ask about the design/architectures of these
systems, as I know, but it doesn't help preaching to rocks
More information about the rsync