[distcc] Detecting an I/O-bound server?

Martin Pool mbp at sourcefrog.net
Wed Jan 25 03:33:44 GMT 2006


On 21 Nov 2005, Dan Kegel <dank at kegel.com> wrote:
> I've been monitoring the latency of a bunch of
> distcc servers.  One in particular always had
> a four second latency to compile "hello, world".
> Turns out it was a workstation where somebody
> was running a memory-hungry application.  The
> load average was around 1.2 -- not that high --
> but the machine was quite sluggish.
> I'd like distccd to recognize conditions like that
> (as opposed to a load average of 2 caused by running
> distcc and gcc), and refuse to accept compile jobs.
> 
> But it's not immediately obvious how to recognize
> busy machines.  One rule of thumb might be
> "if there is a non-gcc app with more than 400MB of RSS,
> the machine is busy".  Another might be
> "If firefox or thunderbird are running, the machine is busy"
> (Actually, that's almost the same as the previous rule, isn't it?)
> Another might be "if there are more than 40 pages of I/O
> per second, the machine is busy."
> 
> On Linux, at least, one can measure that last with something like
> 
> #!/bin/sh
> now=0
> while true; do
>    date
>    then=$now
>    now=`awk '/page/ {print $1 + $2}' /proc/stat`
>    io=`expr $now - $then`
>    test $io -gt 200 && echo "Machine is busy ($io pages in last 5 sec)"
>    test $io -gt 200 || echo "Machine is not busy (only $io pages in last 5 
>    sec)"
>    sleep 5
> done
> 
> That seems a little brittle (where does the magic constant of 40 pages
> per sec come from), but can anyone suggest a better way than reading
> /proc/stat periodically?

(Very old mail; I was just thinking about this again.)

The thing is that you can have machines which can't afford to spare any
memory but that are not doing any IO at the moment.  This is probably
typical of large resident applications.

There are fields for 'Active' such as /proc/meminfo that might be
useful, but I'm not sure if they'll really tell you what this needs,
which is how much the memory is *needed*.

Perhaps what you could do in distccd is:

Malloc say 50MB, and try to touch a byte in each page of it; as you do
this keep an eye on the time.  If this process takes more than say 5
seconds, then give up and don't accept the job for a while - there must
be too much dirty memory.  If you get it, just free the memory, and it
can be handed over to the compiler as free memory.

(Those numbers may need to be tuned; you want to allocate enough memory
to contain the compiler's rss, but not so much that probing it will take
very long.)

If the machine is unloaded, this should be really cheap: distccd get
some memory, frees it, and then the compiler gets it, without anything
being written to disk.  If the machine is heavily loaded it will create
a bit more memory pressure, but I think not too much.  If there is a lot
of VM in RAM but not being actively used this will gradually create some
clean space.

(I'm not sure if this will really work but maybe worth a try.  I'm
especially not sure how well it will work on non-Linux platforms; it may
slow things down or not give a good measure)

-- 
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/distcc/attachments/20060125/8dde8210/attachment.bin


More information about the distcc mailing list