[clug] aggregating disks across multiple machines
daniel at rimspace.net
Mon Oct 26 00:49:17 MDT 2009
Michael James <michael at james.st> writes:
> I've been asked to recommend a setup for a group of high end workstations.
> They each have dual 4 core processors, 32 Gig of ram and 2 x 1.5 TB disks.
> At present each machine has a separate 1.5TB /home and /data partition.
> no redundancy, data will be lost from a single disk failure.
> files will be copied around as jobs dictate => confusion and waste.
> when partitions start to fill up, files will get put where they fit
> not where they should go => even greater confusion.
Mmmm. Some of that sounds like a user training and control issue, not a disk
layout issue, to me. Assuming y'all do provide a central server:
1. Staff should keep only working copies locally, the master should be
elsewhere, ideally in a VCS style "edit, then commit" arrangement.
Failing that, provide a central repository and have them work there.
2. Staff shouldn't have (easy) access to store stuff in the "wrong" place on
> A much better solution would be aggregate disks into a single, large,
> duplicated, data warehouse and have it accessible to all machines.
...maybe. It depends a lot on how your data is used; given the size of the
workstation I am *guessing* that your workload involves a lot of data pounding
on the workstation.
Pushing that to a central machine implies that you need to deliver N * <the
size of your group> random IOPS on that machine, instead of just N random IOPS
That probably isn't cost-effective for you.
> In the past I'd have moved all the disks to 2 machines,* raided them into 2
> disk packs, a master and backup, NFS mounted the master on all machines, and
> set up a nightly rsync to refresh the backup.
> Nowadays would it be better to use Lustre?
> Or is there an updated distributed NFS?
GLusterFS is probably the best but, but I suspect it will not do exactly what
> One that can maintain multiple copies of an NFS data repository and cache a
> file locally when needed, reflecting changes back the master when necessary.
> Or should I look at a global file system?
Almost certainly not, IMO. The cost, in performance and complexity, is likely
to overcome *any* benefit from a shared namespace you get.
I strongly suspect, given the nature of the machines, that your best bet is
some sort of VCS working process:
The user grabs a local, scratch copy of the data they need to work with.
They mutate it, or whatever.
They push the results back to the central store.
Ideally, use a real VCS, but I /bet/ none of them scale to your needs. That
means a standard, relaxed, "talk to each other" locking protocol. ;)
I would probably use RAID-1 for the OS partition, from which the users are
locked out, and RAID-0 the scratch partition they work on.
Then, ensure you have a backup solution to slow, boring, cheap disk that sucks
up those scratch disks at least once a day, deduplicates them, and keeps them
accessible for when a disk fails, or when a user fails, and local data is
(Plus, obviously, good backups for your central repository ;)
✣ Daniel Pittman ✉ daniel at rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
Looking for work? Love Perl? In Melbourne, Australia? We are hiring.
More information about the linux