[distcc] Re: newcome (and a question)

Scott Lystig Fritchie fritchie at snookles.com
Fri Dec 13 17:48:38 GMT 2002


>>>>> "be" == Ben Elliston <bje at redhat.com> writes:

be> This problem must already have been solved by now.  What you want
be> is a TCP-based load balancing proxy: that accepts a connection and
be> passes you through to a machine with a suitable load.  This is an
be> identical problem to load-balancing HTTP servers.  I don't know
be> much about this area, but I'm sure it has been solved.

I had the exact (?) same situation that Cristian describes.

My first solution was to use "balance", an open source TCP load
balancer.  Source is available at http://balance.sourceforge.net/.
Most TCP load balancers I found were written with HTTP in mind.
"balance" is fairly protocol-neutral, or so it seemed.  I didn't do an
exhaustive search.  Perhaps I should've?  {shrug}

All you needed to do was configure it to use distcc's TCP port number
instead of HTTP's port 80.  A command like looked like:

	balance 4200 backend1 backend2 backend3 backend4 backend5 [...]

Dual-CPU machines can appear in the list twice.

Each developer, on his/her own machine, then used:

	setenv DISTCC_HOSTS "proxy proxy proxy proxy"

... where "proxy' is the hostname of the machine that's running
balance.

This scheme works OK as long as all of the backend machines are
homogenous: same number of same-speed CPUs.  If not, then "balance"
will give jobs to slower backend machines when a faster one may be
idle.  It also has very limited capability to handle the situation
when several developers want to have N jobs executed simultaneously
but the pool of backend machines is N-3.

I ended up writing a smarter proxy, mostly because I wanted the proxy
to be able to choose the fastest backend machine that is currently
idle.  I did it on company time, and I did it in Erlang, but if those
things don't dissuade you(*), contact me to tell me you're interested,
and I'll ask my boss if it's OK to release it.

-Scott

(*) It also includes features such as: on-the-fly addition & removal
of backend hosts, smart(er) handling of multiple- and single-CPU
backends, detection of dead backend machines, administratively enable
and disable backend machines, statistics on number & length of
sessions to each backend, and an HTTP interface for monitoring current
backend status.



More information about the distcc mailing list