filelist caching optimization proposal
edwin at datux.nl
Mon May 23 13:24:07 GMT 2005
As a gentoo-user i frequently run the emerge sync command, which in turn does
a rsync with the mainserver. The 'problem' is that the portage directory tree
contains about 19.000 directories and 96.000 files. So building the filelist
takes a pretty long time, because of the many disk accesses that are
neccesary. On the server side the disk-io problem is probably less worse
since after the first time the whole tree is cached in the OS disk cache.
(but still a lot of cpu resources in all the syscalls i think)
My idea is to create a patch for something like a --cache option that will use
a cached version of the filelist: This way instead of creating the filelist
every time (100.000's of system calls, diskaccesses), we can now load the
filelist in one instance. This is even more usefull for rsync-servers, that
are usually read-only. (like the gentoo mirrors or kernel.org which always
has a +100 load it seems ;)
I see the following problem with this:
The cache will become 'out of sync' if something manually changes the local
files. So using the cache option wouldn't be recommended for users that
don't know whats going on. However it can be enabled manually under the right
cicumstances. Maybe it's even possible to do some extra checks on directory
ctimes in the maindir or some other checks.
-What are the opinions of other people on this list?
-Would it be easy to implement, or would it give too much trouble?
-What are the most likely problems i would run into when i try to implement
-Any ideas on WHERE to store such a cache? (a magic hidden file in the
directory that is being builded perhaps?)
//||\\ Edwin Eefting
|| || || DatuX, Linux solutions and innovations
Nieuw Amsterdamsestraat 40
7814 VA Emmen
More information about the rsync