[distcc] Contributing to distCC
lavie at runbox.com
Mon Dec 13 20:53:27 GMT 2004
(firstly, thanks for taking the time to help out)
About the duration of our workshop: I was talking about a school-year (times 3), and a portion of the time is allotted for research/documentation/etc. So the lower bound for the length of our commitment can be significantly smaller than 3 man-years - IOW, we're under no obligation to "fill" a year of work; however, we are obliged to make a significant contribution, however long it takes.
You mentioned Java. It sounds like a cool idea. Do you mean that we adapt the existing project to work with Java instead of GCC by, I guess, providing a gcc-like proxy for the Java compiler? Separating Java preprocessing (or the
equivalent) from compilation?
Generalizing distCC from "just" C compilation to general distributed batch processing appears to me, prima facie, not to conform with our requirements, since there are already numerous (open-source) grid computing projects out there. We'd like to focus specifically on compilation.
About improving the scheduling - we have at our disposal a lab with more than a few computers. Did you have any specific ideas in mind for improving scheduling that might give us a first step in this direction?
We had an idea, albeit somewhat 'academic' in nature, to further dissect the compilation process and see if that can improve the build speed. To dive into GCC, separate the different parts of the compilation process itself, and perform some of them in parallel. Just a rough idea at the moment, to be honest. But if you see any merit in it we'd be very glad to hear your opinion.
Again, thanks for taking the time to reply and for sharing your ideas with us.
We would like very much to reach a decision about how to contribute in a week or two and get to work as soon as possible.
From: Martin Pool [mailto:mbp at sourcefrog.net]
Sent: Monday, December 13, 2004 21:37
To: lavie at runbox.com
Cc: distcc at lists.samba.org
Subject: Re: [distcc] Contributing to distCC
Assaf Lavie wrote:
> /Hello All.
> I'm a part of a group of 3 C.S. students (all experienced programmers)
> who are participating in an open-source grid-computing workshop. We
> are all interested in distributed compilation, and would like to
> contribute to distCC as part of our workshop.
One interesting year-long project might be to write a distributed compiler for Java or C#.
There are small features to do in distcc but I don't know if anything adds up to three (or even one) man-year of work.
One larger feature might be to generalize it from just C compilation into generalized remote batch processing; this requires both some technical changes and also finding and characterizing some tasks amenable to this kind of distribution.
You could look at automatically transporting the compiler to the remote machine but that may not be very academically satisfying.
> /One thing that struck us is that distCC is a very mature and stable
> project, and the truth is we're having trouble deciding how exactly we
> can make a significant contribution to it.
> Therefor I'd like to address the developer community of distCC and ask
> for suggestions on how to contribute to distCC. Our main goal with
> this year-long workshop is to improve on an existing algorithm by way
> of distributizing (is that a word?) it. Are there aspects of distCC
> that could be further seperated and distibuted among machines? /
You could improve the scheduling; this would depend on having good access to a large number of machines for testing. (I don't, at the moment, which holds me back from doing much here.)
> /We gather that preprocessing is done on a local machine, and then
> compilation of translation units takes place on peer machines. Would
> it make any sense, would it be an improvement, to distribute the
> preprocessing stage itself? Could the compliation process itself be
> split up into smaller processes that could run in parallel on
> different machines? These are the sort of enhacements that we are
> required to implement.
Why don't you do some preliminary investigation into how those might be done and post your ideas?
More information about the distcc