Clusters (was CLUG meeting 23 May 2002)

Thu May 23 10:36:37 EST 2002

On Wed, May 22, 2002 at 12:43:00PM +0100, Richard Cottrill wrote:
> 
> Can anyone give me a concise description of what 'grid computing' means?
> From what I can tell it's essentially a different name for a beowolf
> cluster... I think there's supposed to be something about the nodes not
> being dedicated cluster machines in there too.
> 

    "Grid" is a new term for cluster as far as I know.  A beowulf is
(usually, these terms are all pretty flexible) a uniform commodity cluster
like bunyip.  You can also cluster Suns, SGIs, Supercomputers, ..., or a mix
of any.  Some common features of clusters are that there is trust within the
cluster, a common filespace (and often node-specific spaces), they *aren't*
just big SMP machines, they have to message-pass (Mosix is blurring this
distinction), but there is common login and authentication.  There us
usually a cluster-wide scheduler, though this can sometimes be wetware (Bob,
can I have nodes 16-31 today?).  This scheduler is often what is considered
the gridware software: (DQS -> Codine -> SGE), LSF, PBS, NQS, Globus, Mosix,
...  It would be very uncommon in a production cluster to run more than one
cluster-wide scheduler (they'd either have to know about each other, or
they'd both be scheduling jobs on the same resources and be suboptimal).

   Another interesting cluster type - the type that gets a lot of press in
the Grid splashes - is a NOW.  Network of Workstations.  The term comes from
Berkeley, and originated using Sun workstations.  This predates beowulf,
late 80's if I recall correctly.  About the time powerful scientific
workstations started competing with mainframes and supercomputers.  A group
with a dozen RS/6000's could cluster them and stomp all over a university's
central compute resources, while still using them as personal workstations.
A daemon (qidle) would watch load or keyboard/mouse activity, and only
activate that node when no one appears to be making interactive use of it.
Maybe Sun Marketing had a falling out with Berkeley, and Grid==NOW now. I
haven't seen them mention NOW, but that's what's often described. 
Clustering workstations is an _old_ idea, not a new one, dating from the
time that workstations became cheap compared to supercomputers and within an
order of magnitude in power.  They're now being eaten by headless PC
clusters (beowulfs) that are node-for-node cheaper and comparable in
CPU/memory power.

   About the only reasons I can think to build NOW's are:
1) You have limitless administrator time, but can't find the loose change to
   buy hardware (see #1).  The home "cluster" used to render raytraced
   images or encode video comes to mind, while still having the workstations
   available to family/housemates.
2) if you have node-locked proprietary licenses for things like MSI's
   biology suites, that are useful both interactively and in batch jobs.

   Once you've gone to a heterogenous cluster (mixed architectures and
memory capacities), there's no reason you couldn't add the odd SGI Octane to
several shelves of commodity PC's.  Jobs destined to the Octane would just
be submitted with an extra resource requirement, like
	qsub -l msi
	fire up some msi job
	^D

   While jobs destined for 4 nodes on something with an i386 arch would look
more like:
	qsub -l i386,qty.eq.4 -par MPI
	mpirun my_mpi_task
	^D

   Some jobs, that use some other binary that's installed cluster-wide on
all architecrtures, could easily run on multiple architectures, for instance:
	qsub -l povray,qty.gt.20
	povray +N -w 3200 -h 1800 +fp16 -i scene.pov
	^D