[clug] Who has better dd foo than I?

Tue Aug 7 05:41:25 MDT 2012

> Does it really matter?

Yup. Allow me to introduce you to the habits of neuroimaging
researchers... (I should point out that I am in this same bucket)

> I mean, on the surface this comes across as BOFH-style "You dirty users
> can't have all of my shiny new disk!"

In our case all disks will asymptote to 99% full within a week or
month if we are lucky. This is just a way of encouraging users not to
keep everything.

> I know I am naive in these matters, but is there room for discovering what
> the actual business need is for storage and just maybe meeting it?

Sadly we are in the situation where we will never have enough storage
and have to throw old data and intermediate results. We are in the
situation where we have so much data that we can't even (easily) open
datasets! Here for example is a microCT 12GB anglerfish dataset via a
home-spun mapping style interface (well mapping but it's 3D).

   http://caivm1.qern.qcif.edu.au/   (click the 22 at localhost box down
the left hand side)

The first dataset there is a "small" 1GB mouse brain dataset. Where it
gets mental is that we are also acquiring histology on these. A single
restacked Histology dataset at medium resolution is just over 1TB for
a single histological stain (eg: Nissl). Now this particular study
involved 20 brains, each brain needs 5 stains done (5TB) + micro-MRI
(5GB) + microCT (12GB) + blockface imaging 400GB. So let's be
optimistic and say that's 6TB each. That makes 120TB for a single
study and that's just the raw data, no analysis...

We/I am currently running about 20 or more projects with data needs
along the lines above. To store all of this would require a very hefty
budget and it'd all be outdated in a month.

It's for this reason that we have been looking pretty seriously at
building a MKII Backblaze pod for storing data after it's been
processed. Building one looks like good fun and will require a fair
amount of LVM from what I can see.

a