Linux is not reliable enough because ... Was: Linux Kernel Developer job

Alexander Henry plug-devel@lists.PLUG.phoenix.az.us
Fri May 14 23:41:02 2004


On Thu, 13 May 2004 18:52:21 -0700, Austin Godber <godber@uberhip.com> 
wrote:

> Ed Skinner wrote:
>>      Linux does not belong in systems where people's lives rely on it. 
>> (Nor do most other commercial operating systems, for that matter.)
>
> Hello Ed,
> 	I am in no position to agree or disagree as I don't know enough about 
> the subject.  But perhaps you will give us a few examples of what 
> abilities reliable systems have and how Linux is lacking in these areas. 
>   So we could all be better informed.
> 	I simply don't want you to get away with the "I am an expert and I say 
> no!" defense.
>

I know a bit about what he's talking about, as I programmed aviation 
systems for Honeywell's small jet autopilots.

As a whole, these systems did one thing:  take data in from global 
variables (which came from either internal processes or external sensors 
or other processors), process, put it back up into globals, at EXACTLY 
100ms intervals.  Any sooner or later, and all the formulas inside the 
chip would be completely off, taking either wrong sensor data or using a 
random set of values from variables that came from other processors.  
Also, other processors that used the values on your chip would be fed 
wrong values.  There was a lot of maddness in the software development 
process about measuring the processor budget; you wanted it to finish 
calculating in under 85ms to make the window and get the outputs where 
they need to be.

The engineering way of thinking is almost 180 from the hacker way.  To a 
hacker, automation is king, redoing work is evil.  To an engineer, 
redundancy is king, single point of failures are evil.  IOW more work and 
less automation is good.  You want to require there be a LOT of 
cooperation from people and physical systems to let Murphy's Law kick in.  
I've heard of some systems with three different processors, programmed by 
three deliberately isolated teams, each on a different processor and 
compiler; that way all three sets of programmers, or chip burns, or 
compilers have to mess up in EXACTLY the same point for a failure to 
happen.  Upstream, there will be redundant sensors measuring everything, 
and the program will 'vote' on a sensor based on the combined readings.  
Downstream, the pilot has buttons that override the chips.

-- 

--Alexander