Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.

Hongyi Zhao hongyi.zhao at gmail.com
Sat Apr 4 01:21:21 MDT 2015


Hi all,

I'm using Debian, I want to make a local repository which can let me
install packages more conveniently.

Considering that the rsync tool is the Debian official proposed tool for
syncing the files among its different rsync server sites, I use the rsync
client to downloading the deb packages from the different rsync servers
distributed around the world-wide for good loadbalancing and high
efficiency.

The steps are as follows:

1- Make the packages list file to be downloaded based on the Packages.gz
files for the corresponding OS distribution and architecture, say, for
testing, i.e., coded name by jessie and the amd64 architecture, the
following files can be use for extracting the packages list information:

https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-all/
Packages.gz

After I've downloaded all of the above files,  then use the following
command for extract the deb packages filenmaes list:

find /path/to/Packages.gz -type f -name Packages.gz -exec zcat \{\} + |
awk '/^Filename:/{ print $2  } ' > deb-file.list

At this point, the deb-file.list will contain a great number of lines
like the following:

----------
[snipped]
pool/main/m/mockobjects/libmockobjects-java-doc_0.09-5_all.deb
pool/main/s/subtitleeditor/subtitleeditor_0.33.0-3_amd64.deb
pool/main/h/haskell-hgl/libghc-hgl-prof_3.2.0.5-1_amd64.deb
pool/main/l/lsh-utils/lsh-doc_2.1-5_all.deb
pool/main/liba/libav/libswscale3_11.3-1_i386.deb
pool/main/s/smokeqt/libsmokeqtuitools4-3_4.12.2-2_amd64.deb
pool/main/libo/libotf/libotf0-dbg_0.9.13-2_amd64.deb
[snipped]
----------

2- Secondly, I obtain the list for all of the available rsync servers
supplied by Debian official and other open-source sites from here:

https://www.debian.org/CD/mirroring/rsync-mirrors

Note, though the above site say these rsync-mirrors are for Debian CD
images, in fact, most of them are also have the non-cd sections of Debian
repository.  So, I can use them for my purpose without any care.

At this stage, I make the rsync-mirrors for my purpose as follows:

curl https://www.debian.org/CD/mirroring/rsync-mirrors 2>/dev/null |awk
'/::debian-cd\//{gsub(/debian-cd/,"debian",$NF) ; split($NF,a,"<"); print
a[1] }' > mirrors.list

The content of the mirrors.list looks like the following:

----------------
[snipped]
debian.mirror.digitalpacific.com.au::debian-cd/
mirror.as24220.net::debian-cd/
mirror.intrapower.net.au::debian-cd/
mirror.rackcentral.com.au::debian-cd/
debian.anexia.at::debian-cd/
debian.sil.at::debian-cd/
[snipped]
----------------

Currently, I obtain 94 available rsync servers by using the above method
which are exactly the content of the file mirrors.list.

3- Finally, I use the powerful rsync tool to downloading all of these deb
files listed in deb-file.list by using all of the rsync servers stored in
the mirrors.list.  Considering that the bandwidth and maxconnections
limit    imposed by these servers' webmasters -- which are the fact for
most of these servers, I want only download one deb file from each of
these rsync servers at the same time.  And after the downloading finished
for the specific rsync server, than let rsync read in the next deb file
from the deb-file.list.  Again and again, till all of the deb files been
downloaded successfully by parallely using all of these rsync servers.

For the above purpose, I must use a script to do it.  I've tried the
following one which I struggling for sometime to get it, but it cann't meet
all of the above requirements.  In fact it has a great distance from
achieving the requirements I
posted in the above step 3:

-------------------
 mirror=1

 while read -r -a line
 do
 mirror_used=`awk 'NR=='"$mirror"'' mirrors.list`
 rsync -amH --progress --append-verify --timeout=10 --contimeout=5 \
 ${mirror_used} ${line[0]} debs/ &
 mirror=$[mirror+1]
 done < deb-file.list

 wait
-------------------

Any hints for this issue?

Regards
-- 
Hongyi Zhao <hongyi.zhao at gmail.com>
Xinjiang Technical Institute of Physics and Chemistry
Chinese Academy of Sciences
GnuPG DSA: 0xD108493
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20150404/6ea2a383/attachment.html>


More information about the rsync mailing list