On software quality and engineering

Sat Nov 2 17:44:47 EST 2002

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 2 Nov 2002 04:18, Michael Bennett wrote:
> I think the difference between safety criticality and reliability has been
> missed. If the software in the intercom fails and it doesn't work you may
> still have a safe uneventful flight. This is reliability. If the software
> in the intercom fails and causes the radio to transmit while you refueling
> and a spark blows the whole thing up then you have a problem. This is
> safety critical. In an aircraft everything is safety critical that is why
> an aircraft won't take off until the documentation weighs the same as the
> aircraft.
Err, not so. The intercom is used for passenger evacution, so it is an 
airworthiness requirement. Reliability is part of the safety criticality 
determination (ie if the intercom can trigger the radio, then the total 
contribution to the failure tree (cumulative probability of the radio 
transmitting, the intercom failing and the fuel igniting) has to be some 
pretty small number.

> > It is unrealistic to expect that complex systems will not fail. It only
> > realistic that a system fails at (or below) an acceptable level. Normally
> > the risks are defined in terms of probability of failure (or partial
> > performance) and the consequences of failure (or partial performance).
>
> The reality is physical systems fail. The failure modes are well known and
> can be planned for.
This incorrect. The problem with systems engineering (and software engineering 
as a subset) is that people assume that the decomposition (functional 
baseline to allocated baseline) is accurate. Real systems aren't like that - 
the interfaces are never that clear. 

> Software systems do exactly what you tell them to do. The problem is most
> people don't know what they want the software to do and just guess. Which
> comes down to requirements and specifications. There are formal
> specification languages that can be used to mathmatically prove the
> specification. Most people don't use them as it takes too much time and
> more effort when they could be programming. I know some companies now use
> them for all software projects as they can produce software with zero
> defects.
Absolutely incorrect.  You can prove that what you think you implemented is 
defect free, but you can't prove the real system is defect free.

> Software has one failure mode. It is implemented in hardware. If the
> hardware malfunctions (or some radiation causes a bit to change values)
> then the software may not work as desired. However this can be designed
> for.
Real systems are composed of various bits of hardware and software, all of 
which has various interactions. The system should do consistent things under 
identical circumstances. But the real world isn't identical, and the 
interactions are non-trivial. Problems occur because the interactions weren't 
understood, or because different engineers made different assumptions about 
the interface (usually the bit of the interface that didn't appear in the 
interface control document).

> > If the risk is low (not much chance of things going wrong, and it doesn't
> > matter much if it does), then you don't apply as much rigour. If risk is
> > high (either things have a good chance of failing, or the consequences of
> > failure are serious), then you get people with appropriate
> > qualifications, training and experience, and you set up a rigourous
> > process environment.
> >
> > Does really matter if your game crashes twice a week? Annoying - yes,
> > important - no.
>
> It depends to whom.
And it depends on what "really matter" is defined as too. But in terms of 
killing people, full authority digital engine controls are a bit more 
important. So you'd expect a lot more work on the FADEC than on tuxracer.

> > In the defence aviation process, the engineers get used for the up-front
> > definition of requirements (specification), the risk assessment
> > (judgement of significance) and the design review part on significant
> > designs. You don't need a design engineer to conduct a simple fastener
> > substitution.
>
> You do need a design engineer to certifiy the substitute part. This does
> bring up the subject of configuration management.
No, you don't. You need a design engineer to judge that the risk of the 
substitution is sufficiently low, such that even if the fastener fails, the 
aircraft keeps flying. You only need a production worker to certify that the 
new part conforms to the specification.
Configuration management is a different problem. If you know the original 
specification for the component (configuration identification), and you have 
the configuration documentation for the new part, then comparing them is 
usually trivial, and recording the authorisation to fit is a clerical 
exercise.

Brad

- -- 
http://linux.conf.au. 22-25Jan2003. Perth, Aust. I'm registered. Are you?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9w3RfW6pHgIdAuOMRAmppAJ9XcBKd9lrxOdmEilHZ/XZ1XP6YLwCgshsl
03m9v2cQ6F0UMgyz6c4AEko=
=MJMV
-----END PGP SIGNATURE-----