[clug] Story: Fijian Resort complex loses a single disk: business process stops for 1-2 days

Fri Jul 25 08:06:51 MDT 2014

On 25/07/14 23:18, Hal Ashburner wrote:
> Does anyone follow anything like the jwz prescription?
> http://www.jwz.org/doc/backups.html
(delurking)

My current backup requirements are:
  * as automated as possible
  * an off-site component
  * encrypted for some parts
  * space- and time-efficient
  * snapshots

My current solution uses bup (efficient, snapshots), encfs (encryption) 
and CrashPlan (offsite).  I also smei-proactively check for errors on my 
RAID drive.

The RAID is particularly scary.  A weekly cron job dd's one of the RAID 
members, and reports any read errors.  This "can I read every byte of 
the disk" is a proxy for "can I read back every byte as it was written".

I don't think I can get closer to the latter question without removing 
the partition from the RAID, then performing a destructive read/write 
test (a la badblocks).  This is prohibitive, but I am not really 
comfortable with leaving it with just weekly read checks.  My fear is 
that the S.M.A.R.T stuff may not detect read+write errors / bit rot 
until it's too late.

CrashPlan is my current sweetspot for backups.  They have a Linux client 
(though Java-based) and it seems so far to "just work".  I have done one 
sample restore of a subset of data, which is nowhere near enough, and I 
plan to set a monthly task to pull down a random quantity of files and 
do a full comparison

bup is nifty - it uses git pack-files, and is fast.  Downside is that 
you cannot prune any content from the repository.  Upside is that you 
can use FUSE to view the filesystem at any time, and so it's very 
efficient to find and recover individual files or directories from any 
time.  It also simulates an FTP client when FUSE won't work, and that is 
convenient too.

Tony