Correctness vs. Safety

One of the examples that we regularly use in our training material is the catastrophic loss of Lufthansa Flight 2904
on September 14, 1993 when it ran off the end of the runway in Warsaw Poland.  It is an interesting and very
useful teaching example because it illustrates some of the main themes of the training that we regularly provide
to clients on system/software safety. This accident is particularly effective as an introduction to the training
material because students quickly realize that we are not simply talking about defect prevention or quality assurance.

When an Airbus 320 lands, the crew relies on the combination of brakes, ground spoilers and reverse thrusters
to slow the aircraft.  However in the case of Flight 2904 the activation of all three of these critical systems was
delayed such that the aircraft reached the end of the runway at a speed of 72 knots and hit an embankment
resulting in 2 fatalities.

The official investigation concluded that the probable cause of the accident was incorrect decisions and actions
of the flight crew. However, the most interesting aspect of this accident is the effect of certain design features
of the software on the deployment of the brakes, ground spoilers and reverse thrusters.  For example, the ground
spoiler was designed to prevent deployment of the ground spoilers until there is a weight of over 12 tons on each
main landing gear strut.  For reasons described in more detailed accounts of this accident, the aircraft was
banked to the right upon landing and consequently, the weight on the left landing gear strut was less than
12 tons for the first 9 seconds after touchdown.   Similarly, deployment of the reverse thrusters and brakes was
inhibited because of a condition that requires the wheels to be turning faster than 72 knots.  It was a rainy day
in Warsaw on September 14, 2993 with an accumulation of water on the runway, and due to hydroplaning, this
condition was not initially satisfied. By the time that the grounds spoilers and reverse thrusters were deployed,
the aircraft had already gone more than halfway down the runway.

There always seems to be a flash of insight on the faces of students in our training courses when they realize
that there is no evidence of a software defect or lapse in quality assurance in spite of the fact that the decision
logic implemented by the software makes it impossible for the crew to slow the aircraft upon touchdown.  We
use this flash of insight by explaining how it illustrates a major theme of our training material on system/software
safety, namely, that sources of safety risk in software-intensive systems are not limited to software defects or
lapses in quality assurance.  A software-intensive system could be unsafe even if it is “defect free”.

Upon hearing our explanation that the “software in the aircraft behaved exactly as designed”, students in our
training courses usually ask why such restrictions on the deployment of brakes, ground spoilers and reverse
thrusters were included in the design of the software.  Simply put, these restrictions were intended to be safety
mitigations – that is, features of the software that were intended to improve the safety of the aircraft.   This must
sound as paradoxical to many readers as it does to most of the students in our training course. The explanation
of this paradox leads to a second major theme of our training material on system/software safety, namely, that
the addition of a mechanism to a system for the purpose of mitigating a particular hazard might simultaneously
become a new source of risk for a different hazard.  This theme will be taken up as a topic in a future addition
to this blog which continues a discussion of what other lessons about system/software safety can be learned
from Flight 2904.