[distcc] My big plans for distcc

Victor Norman vtnpgh at yahoo.com
Fri Sep 3 18:15:26 GMT 2004


All,

I have been experimenting with distcc and pcons (a
derivative of cons that does parallel compilations),
and have so far achieved really exciting results: my
compilation time for our code has gone from 3 hr. 57
min. to 39 mins. -- about 1/7th of the time.  I think
it is pretty exciting.

But, I'm the only one that has been using this new
setup so far, and we have 70 software engineers that
would like to start using it.

In order for my setup to work with that many
engineers, of whom up to, say, 20 could be compiling
simultaneously, I think I need to make some extensions
to distcc.

Our setup is this: we have some fast Solaris machines,
mostly multiprocessors, that many people use for
compiling, etc.  We also have a few multiprocessor
Linux boxes, which nobody uses right now.  And, we
have alot of underused not so fast Solaris boxes and
desktop Linux machines which could be used for
compiling, I think.  We are all within Marconi, and
all machines use NFS, so compilers, home directories,
etc. are all available everywhere.

What I'd like to make available to the engineers is an
easy-to-use, fast, fair, and reliable compilation
farm.

I thought I'd share my plans with you all.  I would
love to get feedback on my ideas.

o Goals for the system:
  o to pick best servers available, whether 1 engineer
is compiling or 20 are.
  o to degrade gracefully under heavy loads.
  o fairness when heavily loaded.
  o to work with compiles dispatched from multiple
machines simultaneously (i.e., multiple "clients").
  o to work with other programs that automatically
monitor load average
    and CPU availability.
  o to work with our heterogeneous network of servers
-- some very fast, some
    multiprocessor, some very slow, some conditionally
available.

o Plan:
  o use the existing 'hosts' file, which indicates the
current server
    list, how many compilations can be run on the
machine, and how to
    communicate with the server.
  o a new config file will be read by distcc.  This
file contains the list
    of all servers that *may* be available as
compilation servers.  Each line
    in the file contains the hostname/ipaddress,
number of processors, the
    machine's current load-average, and a
"power_index", which indicates
    how fast the server is at compiling programs.  All
information in this
    file is static, except the load-average
information, which is updated
    periodically by a separate daemon (see below). 
The compilation farm
    administrator is the only one to add servers to
this file, when new
    servers are put on the network.
  o distcc reads this file to figure out how powerful
a CPU is.
  o distcc groups machines into tiers, where each tier
contains machines with 
    similar compiling power.  The "top" tier is the
group of fastest
    machines, and the "bottom" tier is the group of
slowest machines.
  o server selection is as follows:
    def pick_server:
	for each tier, from top to bottom:
	  randomize the list of servers in the tier
	  for each server in the list:
	    if the server is available (i.e., it is not
marked by distcc as
	        blocked or on probation, etc):
              return the server.
  o multiple compilations on dispatchers will all
share the same hosts
    file, and use the same directory for lock files,
etc.
  o distcc groups servers into tiers based on the
server's:
    o power_index
    o load-average
    o status (blocked, on probation, etc.)
  o I don't know the exact algorithm for this
computation, but it will be 
    simple and fast.

  o an "availability server daemon" will run on a
single machine and will
    write to the hosts file, adding machines that
become available, and removing machines that are not
available anymore.  And it will write the
load-averages  into the new hosts config file.  It
gets this informationi by periodically receiving
messages from each machine that is configured to
participate in the compilation system. 
  o All machines in the compilation farm will run a
small client daemon to
    communicate with this server daemon, so that the
machine's load-average 
    and availability status is updated periodically. 
A machine's
    availability may be determined by its
load-average, or when its
    screensaver is running or not, etc.
  o we have a prototype for this already -- it is
basic socket programming,
    mostly. 

o Separate thought:
  o I think we may need a distcc config file
(.distccrc) to allow the user
    to set DISTCC_DIR, host selection algorithm,
default arguments, etc.
    As more features become available, it would make
maintaining the system 
    easier, I think.


I relish your feedback.

Vic Norman



		
_______________________________
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush



More information about the distcc mailing list