On software quality and engineering [OT]

Sun Nov 3 00:12:33 EST 2002

On Saturday 02 Nov 2002 5:44 pm, Brad Hards wrote:
> On Sat, 2 Nov 2002 04:18, Michael Bennett wrote:
> > I think the difference between safety criticality and reliability has
> > been missed. If the software in the intercom fails and it doesn't work
> > you may still have a safe uneventful flight. This is reliability. If the
> > software in the intercom fails and causes the radio to transmit while you
> > refueling and a spark blows the whole thing up then you have a problem.
> > This is safety critical. In an aircraft everything is safety critical
> > that is why an aircraft won't take off until the documentation weighs the
> > same as the aircraft.
>
> Err, not so. The intercom is used for passenger evacution, so it is an
> airworthiness requirement. Reliability is part of the safety criticality
> determination (ie if the intercom can trigger the radio, then the total
> contribution to the failure tree (cumulative probability of the radio
> transmitting, the intercom failing and the fuel igniting) has to be some
> pretty small number.

OK, however the point I was trying to make was that the process for 
determining acceptable safety levels is different from determining 
reliability levels.

On a side track, this is why specifications need a high level of (good) 
detail. My understanding of what an intercom does is different to Brads.

> > > It is unrealistic to expect that complex systems will not fail. It only
> > > realistic that a system fails at (or below) an acceptable level.
> > > Normally the risks are defined in terms of probability of failure (or
> > > partial performance) and the consequences of failure (or partial
> > > performance).
> >
> > The reality is physical systems fail. The failure modes are well known
> > and can be planned for.
>
> This incorrect. The problem with systems engineering (and software
> engineering as a subset) is that people assume that the decomposition
> (functional baseline to allocated baseline) is accurate. Real systems
> aren't like that - the interfaces are never that clear.

What you have described is not a problem with the systems engineering 
methodology but with people making false assumptions. Usually this the case 
of people thinking "can be planned for" means "has been planned for / 
considered".

All interfaces can be found in undergraduate text books. Whether people know 
or bother to specify them is a different matter.

> > Software systems do exactly what you tell them to do. The problem is most
> > people don't know what they want the software to do and just guess. Which
> > comes down to requirements and specifications. There are formal
> > specification languages that can be used to mathmatically prove the
> > specification. Most people don't use them as it takes too much time and
> > more effort when they could be programming. I know some companies now use
> > them for all software projects as they can produce software with zero
> > defects.
>
> Absolutely incorrect.  You can prove that what you think you implemented is
> defect free, but you can't prove the real system is defect free.

Why not? It might be expensive and take alot of time but it can be done (In 
both physical systems and software). Defect free means that the system 
operates as specified under all given constraints.

The only reason it is impossible is that you don't know what your system is 
meant to do (in all circumstances under the constraints).

> > Software has one failure mode. It is implemented in hardware. If the
> > hardware malfunctions (or some radiation causes a bit to change values)
> > then the software may not work as desired. However this can be designed
> > for.
>
> Real systems are composed of various bits of hardware and software, all of
> which has various interactions. The system should do consistent things
> under identical circumstances. But the real world isn't identical, and the
> interactions are non-trivial.

Yes. If it was trivial I wouldn't get paid as much :-)

>Problems occur because the interactions
> weren't understood, or because different engineers made different
> assumptions about the interface (usually the bit of the interface that
> didn't appear in the interface control document).

What you have described is an incomplete specification. If the interaction was 
in the ICD then it would be understood. Thats why design is so important to 
do properly. 

However it is seldom the case that the design stage is done properly. In my 
experience this is a cultural and organisational problem. As design only 
produces paper with words and pictures that most managers can't understand, 
it is given a lower priority. It is usually finished incomplete, with a high 
level of ambiguity. The rationale is that if you don't get to 
contract/production the project will be canned so it is better to have a bad 
design than no project. And you can always fix it later.

> > > If the risk is low (not much chance of things going wrong, and it
> > > doesn't matter much if it does), then you don't apply as much rigour.
> > > If risk is high (either things have a good chance of failing, or the
> > > consequences of failure are serious), then you get people with
> > > appropriate
> > > qualifications, training and experience, and you set up a rigourous
> > > process environment.
> > >
> > > Does really matter if your game crashes twice a week? Annoying - yes,
> > > important - no.
> >
> > It depends to whom.
>
> And it depends on what "really matter" is defined as too. But in terms of
> killing people, full authority digital engine controls are a bit more
> important. So you'd expect a lot more work on the FADEC than on tuxracer.
>
> > > In the defence aviation process, the engineers get used for the
> > > up-front definition of requirements (specification), the risk
> > > assessment (judgement of significance) and the design review part on
> > > significant designs. You don't need a design engineer to conduct a
> > > simple fastener substitution.
> >
> > You do need a design engineer to certifiy the substitute part. This does
> > bring up the subject of configuration management.
>
> No, you don't. You need a design engineer to judge that the risk of the
> substitution is sufficiently low, such that even if the fastener fails, the
> aircraft keeps flying. You only need a production worker to certify that
> the new part conforms to the specification.

Yes but as you said you still need the engineer. Isn't that certifying anyway?

> Configuration management is a different problem. If you know the original
> specification for the component (configuration identification), and you
> have the configuration documentation for the new part, then comparing them
> is usually trivial, and recording the authorisation to fit is a clerical
> exercise.
>
> Brad

Michael.