[clug] Playing funny with make..

Andrew Janke a.janke at gmail.com
Mon Sep 14 05:56:42 MDT 2009


> On the silly suggestions front, though, install condor[1] and use DAGman to
> schedule the job dependencies[2].  That makes it easy to run and retry jobs
> appropriately, as well as to distribute the work over multiple machines when
> you get sick of waiting on your own machine to do it all.

Well I already use Sun Grid Engine on a daily basis for all heavy
lifting. FWIW I have always failed to see why things like dependencies
are so hard in Condor. In SGE all I do is this:

   $ qsub -hold_jid <blah> ...

Where <blah> can be either a job id or a regex to match job names.
Still this doesn't solve my requirements.

1. batch/distributed processing with dependency tracking  (where n=100+ cores)

2. re-running of failed jobs (And all the dependencies therein)

3. robust job execution (each stage has a check component that looks
at output image quality, completeness etc -- you cannot just rely upon
the exit status of the job.)

4. ability to be run both with and without a batch system (And gracefully).

So currently I use a system of simple shell scripts with the
integrated checking stages that call more complex perl scripts that
wrap up chunks of functionality who in turn call low-level atomic C
programs. I then have shell scripts that can submit entire stages to a
batch system or run a stage locally.

Given that SGE also supports qmake (distributed make), it would seem
obvious to add a great big Makefile to the whole thing as then you can
either type make for local execution or qmake for cluster execution.
There are more devious ways to approach qmake such as adding a qsh in
front of each execution in make but this seems hackish.

Mind you there are tons of other libraries out there that a supposed
to handle all this (perl libraries, Scons, Ruby, etc, etc) but they
all have their own subtle nuances and tend to not be as cross platform
as make and sh.


--
Andrew Janke
(a.janke at gmail.com || http://a.janke.googlepages.com/)
Canberra->Australia    +61 (402) 700 883


More information about the linux mailing list