[clug] Story: Fijian Resort complex loses a single disk: business process stops for 1-2 days

Scott Ferguson scott.ferguson.clug at gmail.com
Sat Jul 26 07:48:10 MDT 2014


On 26/07/14 20:44, Alex Satrapa wrote:
> On 26 Jul 2014, at 19:57, Scott Ferguson <scott.ferguson.clug at gmail.com> wrote:
> 
>> How does that verify the backup medium? (as opposed to verifying *what*
>> is being backed up).
> 
> I’m not sure what you’re asking here.

(Sorry - succinctness and haste are rare company)

Context.
[original quote]
>> It's also good practise to test backup medium themselves - to see
>> how reliable they are at "keeping" verified backups.
> 
> One simple option is to mount the backup & compare the source and
> backup.

Again - that verifies the backup, not the reliablity. Counting the
bottles you store on the nature strip isn't laying down a cellar for
your future ;p

[/quote]
You snip out all of my original post bar the sentence about verifying
the backup medium then state that you compare the backup to the
original. Why?  (I also note that you could/should have checked the
backup by using rsync with the checksum option - cp is a dangerous way
to do backups)

The discussion (see OP) is about a failure of a backup system. One of
the most common is media failure due to a failure to practise BMP and
actually prove the reliability of the medium (damaged array *and* pp
backup plan didn't backup critical data).

In your stated case you can't do much to test your medium (Flash). But
to spend an entire day doing a full backup.... so your backup system
takes a long time to verify it's accuracy, and that verification can
only be done if the system being backed has not changed since the back
up (backup time + verify time = time system *cannot* change). Seems
impractical - I'm not criticising that aspect of the scheme as I don't
know your use case. But I'd be concerned that without some sort of
checksum/signature system you cannot verify whether the backup *remains*
reliable when the original system has changed (stand-alone verification).



> What’s the point of verifying the backup medium independently of the data stored on it?


That would be a question. :)
> 
> Alex
> 


Re: the OP
 - steve wrote:-
> Backups had silently failed.
> 
> This may have been “good practice” in 1995-2002, but not now.

Definitely was not "good practice" as early as the mid-80s. If the tape
didn't "eject" *and* the tar log contained failures the backup "failed".
No "silent" failures.

Dual backup schemes were certainly recommended industry (Red Book?)
practise. As was backing up partition and array configs have been BMP
for several decades (at least since the days of the PS/2 desktop and B&V
networks) - at least when I've worked with the companies that manage
archival for government data (still not uncommon to *not* see it in some
"enterprise" systems). Ditto for checking mediums. More recently I
worked on the pre-planned transfer of backups on old, outdated tape
formats to newer formats. The new tapes were quality tested before data
transfer, and verified as lossless copies before being signed off (with
plans in place to repeat the process in the future should the archiving
legislation or client requirements be extended - should be some IBM
process documentation on it somewhere if people are interested in
digging through the system).
EDS had/has similar procedures. I'm guessing/speculating that many large
concerns did have sensible practises - but there wasn't (still isn't)
"industry" practices which led to the ITIL system (sometimes referred to
as 1001 exceeding bleeding obvious practices).

Kind regards


More information about the linux mailing list