[distcc] Loadbalanced distcc

Sean D'Epagnier geckosenator at gmail.com
Wed Jan 4 21:31:44 GMT 2006


>
> Again, my DMUCS solution provides load balancing.  DMUCS is basically a
> wrapper around distcc.  You run a "host-server" and you run very simple
> load-average-reporting tasks on each host (along with the distccd's).  Then,
> your compile contacts the host server for a host, places the result in
> DISTCC_HOSTS environment variable, and runs distcc.
>
> I am in the process of getting my code onto sourceforge in cvs.  It may
> even happen today.
>
> Please let me know if you want me to send you the code directly.
>
> Vic - Show quoted text -
>


Let me try to understand this correctly:

distcc is a wrapper around gcc, and dmucs (or whatever it is called) is a
wrapper around distcc.  When dmucs is run, it reads data from the locally
running host server for the DISTCC_HOSTS variable and runs distcc.  All of
the distcc servers must run a distcc server and a client to the host
server.  These clients send information about the load state of each server
to the host server, so it has a good idea of who is idle, and will set the
DISTCC_HOSTS variable correctly.

Did I understand that correctly?  Have you measured improved performance
over distcc using this setup?  Why do you need a server and client setup,
why can't the wrapper measure how long distcc takes for that given host and
submit it to the host server itself?  I think there may be a few problems,
but I'm not sure.  I don't see how this would speed up a compile.  distcc
already, sends files to compile to the host that has the least jobs
currently.  Because two files can compile on a host in parallel at nearly
the same speed as serially with a multitasking os, as long as the makeopts
are set high enough "-j8" then all of the cpus can be utilized well.  The
problem is when there are less files left than cpus to compile them, then
some cpus become idle.

Distcc slows down when a big file is given to a slow computer, and a small
file is given to a fast computer, and there are no more files that can be
build until the big file completes.  I don't see how you can solve this
problem if you program is just a wrapper.  Distccd needs to measure the
compile time as well as something to go with it (I suggest file size) to
calculate the average speed of the server.

The distcc client would look at all the files that can potentially be built
in parallel,  There are a lot of ways to do this, the easiest is to send the
files in order from largest to smallest, this is not perfect, but this way
the longest you have to wait is the time it takes the slowest computer to
compile the smallest file.  Another way would be to try to fit the files so
that the computers will all finish as close to the same time as possible,
If it's faster to send two files to a fast computer and leave a slow one
idle, it should be done.  Finally, if the compile has been done before, the
distcc client can cache how long it takes to compile certain files.  Using
this information instead of file size to determine how to send the files to
various computers would further increase efficiency.

Another thing to mention, files that open up other files to compile should
have a higher priority and be compiled first.

I realize these suggestions greatly complicate distcc, it would need to
either interpret makefiles directly, or return immediately, and store the
options passed to it in a build tree.  I think it makes sense to have distcc
be a wrapper to make instead of gcc.

-sean

On 1/4/06, Victor Norman <vtnpgh at yahoo.com> wrote:
>
> Again, my DMUCS solution provides load balancing.  DMUCS is basically a
> wrapper around distcc.  You run a "host-server" and you run very simple
> load-average-reporting tasks on each host (along with the distccd's).  Then,
> your compile contacts the host server for a host, places the result in
> DISTCC_HOSTS environment variable, and runs distcc.
>
> I am in the process of getting my code onto sourceforge in cvs.  It may
> even happen today.
>
> Please let me know if you want me to send you the code directly.
>
> Vic
>
>
> *Sean D'Epagnier <geckosenator at gmail.com>* wrote:
>
> I'm also very interested in this subject.  I think there are a lot of
> advanced ways to do it, but a few simple ones would really help
> performance.  If anyone wants to contact me about working on this, or would
> like to share ideas, feel free to.
>
>
> On 1/3/06, Patrik Olesen <patrik at famolesen.com> wrote:
> >
> > Hello,
> >
> > I have seen that there where an old thread about developing some sort of
> > loadbalancing for the deployment of the compiler jobs. What is the
> > progress of this work does anybody have any news?
> >
> > Best regards,
> >   Patrik
> > __
> > distcc mailing list            http://distcc.samba.org/
> > To unsubscribe or change options:
> > https://lists.samba.org/mailman/listinfo/distcc
> >
>
> __
> distcc mailing list http://distcc.samba.org/
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/distcc
>
>
> ------------------------------
> Yahoo! Photos
> Ring in the New Year with Photo Calendars<http://us.rd.yahoo.com/mail_us/taglines/photos/*http://pg.photos.yahoo.com/ph//page?.file=calendar_splash.html&.dir=>.
> Add photos, events, holidays, whatever.
>
>
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the distcc mailing list