[clug] Distributed File Systems
cottrill.david at gmail.com
Tue Jan 6 02:16:44 GMT 2009
bloody gmail not replying to list...
This really has changed topic so I suppose I should change the subject too.
---------- Forwarded message ----------
From: David Cottrill <cottrill.david at gmail.com>
Date: Tue, Jan 6, 2009 at 1:11 PM
Subject: Re: [clug] Interesting article
To: Alex Satrapa <grail at goldweb.com.au>
I did some research on redundant/distributed file systems and came across an
article that described how Google and (in a related but not identical way)
Gmail hold onto their files. The short (and I'm sure in some way erroneous)
description is that, for starters, all data is held in three or more copies.
The chunks of data comprising the files/emails/information are large blocks
to speed up throughput and ease referencing slightly, chunks of data are
spread acoss disks by some method more random than linear.
The referencing is of course the trick - hard drives have two functions -
storage or reference.
The reference hard drives hold pointers to two or more chunks of data for
each logical section, these in turn are referenced by hash sum (my memory
let me down here).
The entire system is not built to cope with failure but to expect it every
moment so instead of a RAID building a new disk to replace an old one, each
new machine simply announces its presence and ability to collect data and
data is supplied. When an old machine dies (even a reference box) the load
on the other boxes becomes fractionaly greater for the nodes that were
supported by the failed box and those nodes are added to spare capacity. Any
data chunk not existing in the required 3 or more spots has priority.
In this way the data network automatically tunes to high demand nodes and
doesn't lose data.
There are of course flaws in the system somewhere and there are gaping holes
in my explanation but you get the idea...
On Tue, Jan 6, 2009 at 12:03 PM, Alex Satrapa <grail at goldweb.com.au> wrote:
> On 06/01/2009, at 11:38 , Michael Still wrote:
> This is incorrect.
> Michael's the guy that was leaning out the window telling the helicopter
> pilot exactly where he was, right?
> linux mailing list
> linux at lists.samba.org
More information about the linux