[clug] aggregating disks across multiple machines

Daniel Pittman daniel at rimspace.net
Wed Oct 28 03:28:55 MDT 2009


Paul Wayper <paulway at mabula.net> writes:
> On 26/10/09 17:49, Daniel Pittman wrote:
>> Michael James<michael at james.st>  writes:
>>
>>> I've been asked to recommend a setup for a group of high end workstations.
>>> They each have dual 4 core processors, 32 Gig of ram and 2 x 1.5 TB disks.
>>> Nice.
>>>
>>> At present each machine has a separate 1.5TB /home and /data partition.
>>> Bad:
>>> 	no redundancy, data will be lost from a single disk failure.
>>> 	files will be copied around as jobs dictate =>  confusion and waste.
>>> 	when partitions start to fill up, files will get put where they fit
>>> 		not where they should go =>  even greater confusion.
>>
>> Mmmm.  Some of that sounds like a user training and control issue, not a disk
>> layout issue, to me.  Assuming y'all do provide a central server:
>
> I think Michael was talking about using both machines as some kind of
> distributed storage, rather than a 'central server'.  I want to find out about
> this too.  The key problem is that a lot of the cluster storage that works
> like this assumes that each machine is accessing the same backend store.  This
> is convenient for those that have infiniband or fiberchannel cards lying
> around and SAN units sitting in their cupboards, but for those of us with just
> standard machines I haven't found any obvious candidates.
>
> Anyone seen something that makes a bunch of disks spread across multiple
> machines act like a big communal block device?

Yeah: GLusterFS.  It does exactly this, and is almost certainly what you
want.  The alternatives tend to look like Hadoop or so — a dedicated storage
solution for a data processing system, not a filesystem.

You would probably want to unify[1], and perhaps the BDB backed store[2], for
this; perhaps AFR[3] if you really felt enthused, but it is still not quite
where I would like a replicated storage device to be.

Apparently, though, the latest release hides all that behind a sane
interface.  Go, GLusterFS developers.

        Daniel

Footnotes: 
[1]  Single namespace over multiple machines.

[2]  Stores small files in a BDB spool, excellent for many small files, still
     looks like a POSIX filesystem to the client.

[3]  Mirroring, basically.

-- 
✣ Daniel Pittman            ✉ daniel at rimspace.net            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons
   Looking for work?  Love Perl?  In Melbourne, Australia?  We are hiring.


More information about the linux mailing list