[clug] A quest for "near-line" linux file containers/archives

Andrew Janke a.janke at gmail.com
Sun Nov 15 19:59:17 MST 2009

Hi all, here's hoping someone might be able to point me in the right

The situation:

* I deal with bazillions of DICOM medical images on a day to day
basis. (OK, 5.5m of them at last count)

* These files are self contained and generally between 100k and 600k
each and compress well (60-70% reduction).

* The problem is I have lots of them and convert them to a less
unwieldy format for daily use.
    (This generally means about 300 of the little (2D) suckers convert
to 1 output 3D file)

* Once converted I rarely go back to the originals but the requirement
is still there from time to time.

* I use rsync for backup, and while it is good, even it gets bored
with 5m+ files

So I have a few options.

1) bunch them up into nice chunks (this is easy, they are already
organised hierarchically) gzip and tar them and keep some sort of db
file alongside them so that I know what is in there or just use 'tar
tvf' but this would require me to change a bunch of code that expects
the files to just "be there".

2) use a real DB. <twitch, shudder>.

3) get "clever" with something like FUSE and mount a local
filesystem/archive in a file when the directory is stat'd. Yes there
might be a slight delay when this happens but I can quite happily live
with that if it makes backup less unwieldy, gnome does it for archives
so surely this functionality exists somewhere in a C/L tool or daemon.

4) <insert more clever idea here>

I can do #1 (but don't want to just re-invent wheels), I have a great
amount of FUD regarding #2 -- there are a couple of TB's of the things
but #3 looks intriguing...  Anyone know of anything that exists to do


Andrew Janke
(a.janke at gmail.com || http://a.janke.googlepages.com/)
Canberra->Australia    +61 (402) 700 883

More information about the linux mailing list