[clug] Basic work queue
Alex (Maxious) Sadleir
maxious at gmail.com
Sat Jun 18 01:59:32 MDT 2011
On Sat, Jun 18, 2011 at 5:19 PM, Carlo Hamalainen
<carlo.hamalainen at gmail.com> wrote:
> I have a long computation that I would like to run on Amazon EC2. I
> want to manually start up a few high-CPU instances (say 5 or 10) and
> have each instance request a task from some master instance, do the
> task, signal completion to the master, and then get the next available
> The problem is "embarrassingly parallel":
> * The master only needs to send a 100 line shell script to an instance
> to get it running on a new task.
> * No instances need to communicate with each other.
> What is the simplest way to make such a work queue?
> I did look at Amazon SQS  but it says that the queue messages can
> live for only up to 14 days, but my queue might last for a month or
> more, depending on how many worker instances I have running.
> Is something like RabbitMQ suitable?
> Would it be easier to set up PBS and use my Linode as the master?
I haven't researched this fully yet but I have a similar requirement
(need to run a large number of lengthy geo queries in the background -
currently working in PHP but could adapt to any modern language).
I'm leaning towards a piece of software called "celery"
It runs a daemon over RabbitMQ that will listen for jobs and then run
them for you whilst reporting success/failure. So adds more features
than writing your own job listening service from scratch.
You would have to make an EC2 image that runs this daemon to connect
to your RabbitMQ broker automatically on startup and of course any
other libraries/dependencies your script requires.
Currently tasks are written in Python although you could probably just
make a wrapper piece of code that would take parameters for your shell
script and run it through the usual python shell functions. If it's
the whole script you need to send every time, would have to look at
how well that works - the documentation suggests it's fine to even
send binary data as a task parameter/input.
You also need some piece of code to put the tasks in the queue and
probably a database to collect the results (either using the built in
database result store or your own code to collect the results and
store them somewhere). So database server and RabbitMQ broker on
linode then - remembering that there's bandwidth costs associated with
internet to EC2 direction communication.
More information about the linux