[distcc] Suggestion about host selection

Ihar `Philips` Filipau thephilips at gmail.com
Thu Jun 19 21:35:56 GMT 2008


On Thu, Jun 19, 2008 at 9:11 PM, Don Provan <dprovan at bivio.net> wrote:
> Another possibility, perhaps in addition, would be to add
> a weighting value. The existing number indicates how many
> slots are allowed altogether, an additional number could
> indicate the ratio of slots any given host should be
> assigned relative to other hosts in the list. I think that
> does a lot of what you're driving at, but with a more
> obvious and easier to use syntax and a simpler implementation.

From my experience, usual setup of compile farm is: several hosts with
abundance of resources and some idling developer's workstations (e.g. at night).

Though weights (priorities) seem to be pretty logical solution, I'd
say allowing user to group several
hosts together would be more straightforward approach.

Setups I have worked with could be described as (where curly brackets
mark host group):
{ ded_host1/8 ded_host2/8 ded_host3/8 } { ws1/2 ws2/2 ws3/2 ws4/2 ...
} { ows1/1 ows2/1 ... }

where "ded_host.*" were dedicated hosts from compile farm (which are
first on the list and as long as they have free slots they should be
used before trying any other host from any other group), "ws.*" were
developer workstations and "ows.*" were old developer workstations
(with less resources as compared to "ws.*" group).

We did manually something similar by starting/stopping distcc daemon
on workstations using cron. In the day, distcc on workstations are
down and only dedicated compile farm was used. In the off hours,
nightly builds kick in and they try to use all available resources.
(In our case, number of slots on workstations was more than double of
the total slots on dedicated distcc servers).

Just to conclude. Priorities are nice, but generally intention is to
prioritize some dedicate hosts and to deprioritize some other
alternative hosts. That in much simpler way is solved by maintaining
instead of single list of hosts/slots several such lists in order as
specified by user. Such DISTCC_HOST would be IMHO much easier to read
and maintain. As well implementation should also be simpler.

P.S. And if one would be able to also tell "working hours" of the
server group - then it would even further help reducing DISTCC_HOSTS
maintenance overhead. Actually, looking at the idea second time, the
server grouping, when group is something tangible in configuration
allowing you to attach extra properties to it (e.g. "working hours"),
is really nice idea. Next thing I would try to "attach" is list of
available compilers/versions to facilitate distributed cross compiling
;)

>
>> -----Original Message-----
>> From: distcc-bounces+dprovan=bivio.net at lists.samba.org
>> [mailto:distcc-bounces+dprovan=bivio.net at lists.samba.org]On Behalf Of
>> Thomas Schürger
>> Sent: Thursday, June 19, 2008 11:57 AM
>> To: distcc at lists.samba.org
>> Subject: Re: [distcc] Suggestion about host selection
>>
>>
>>
>> > If this was the way it was done, it'll lead to poor utilization of
>> > servers in some situations: the number of concurrent jobs
>> accepted at
>> > the servers is 2 greater than their number of CPUs. So, the
>> client would
>> > fill the first server with more jobs than in can handle at
>> the same time
>> > before even considering the second server. (Remember that the slot
>> > mechanism on the client does not take into account which
>> servers other
>> > clients have reserved.)
>>
>> It would be fine with me if the current slot selection would
>> remain the
>> default, but it should also be possible to use the other slot
>> selection
>> if the user wants that.
>>
>> > On the other hand, the statement 'prefers hosts towards the
>> start of the
>> > list' is very much true in the aggregate when you have multiple
>> > concurrent clients using the servers!  Then you should
>> consider using
>> > the --randomize flag, which probably should have been the default
>> > setting anyway.
>>
>> Where is that flag? Randomized selection sounds good. What about
>> using an exponential distribution, which prefers slots towards the
>> start of the list? Would be easy to implement.
>>
>> > The major omission in the current code, in my opinion, is that
>> > randomization does not take into account the specified host slots.
>>
>> OK, that would be something to change then.
>>
>> It would be fine if one could list a host multiple times (which would
>> emulate the behavior I was looking for). This is not possible
>> currently.
>>
>> For example, I could choose to use
>>
>> host1/1 host1/1 host1/1 host1/1 host1/1 host1/1 host2/1
>> host2/1 host3/1
>> host3/1 host3/1
>>
>> which would lead to what I wanted. But with the current selection
>> algorithm, each of the hosts' slots would have the same slot number
>> (all 1), so when host1/1 is locked, distcc would try to use the
>> second host1/1 entry, which of course is also locked (same lockfile
>> name). So in practice this is really the same as "host1/1 host2/1
>> host3/1".
>>
>> The easiest way for a better selection implementation would be to
>> first expand the host/slotcount list to a list of host/slotnumber
>> pairs and then select
>>
>> a) linearly from the front
>> b) with exponential random distribution
>> c) with uniform random distribution
>> d) ... what ever else may seem appropriate
>>
>>
>> Greetings,
>> Thomas.
>>


More information about the distcc mailing list