Risk measurement

How to make sure you fly safer.

Estimating probability of component failure using FMEA.

Ok, you’ve decided to use an auto engine in your airplane. You’re concerned about just how reliable this is going to be. You have a lot of decisions to make. Do I need dual fuel pumps? Do I convert the engine to carburetor from stock EFI? Which engine components are reliable, which aren’t? 

In my occupation, when working with a new company I make a point of interviewing many employees. My purpose is to get a consensus of opinion regarding “What are the main problems we have”? The results are sometimes shocking.

It’s not too difficult to obtain consensus. However, they usually don’t have any supporting facts. At an aluminum foundry the consensus was: “Hydrogen contamination of the molten material is the #1 problem”. All of the industry literature also focused on hydrogen. After definitive research and experiments, we found that hydrogen virtually had no effect. How can an entire industry be so wrong? Well, the point is that left to our natural tendencies, we easily lose perspective and make the wrong decisions.

I can't emphasize this enough. Our natural method of arriving at decisions is heavily distorted by feelings and anecdotes. We retain gross distortions of risk. Seldom do we have an accurate perspective. There have been a number of studies on the subject. I'll add some links later to explain these points. 

It wasn’t until the late 1970’s that U.S. auto companies started using Failure Mode and Effects Analysis (FMEA). The FMEA was one of the tools utilized to bring US auto manufacturers closer to Japanese quality. FMEA sounds pretty complicated, but like all things, once you are familiar with the process it ends up being fairly simple. Use of FMEA is described as a “discipline” because it forces you to use facts to make decisions. Even though these “facts” are actually estimates, the process ends up being quite effective.

We are going to do an FMEA. What is the goal we are trying to achieve with this process? It’s to make sure we place our efforts on the facets which need it. Put another way, it’s making sure we don’t waste time and effort on insignificant items, while ignoring the truly important items.

There are only three pieces to the puzzle.

1)       If the component failed, how serious would that effect the airplane?

2)       What is the probability of the component failing?

3)       What is the likelihood that you would notice the problem before failure?

 

You may have heard statements like “You have to replace component x on your engine before installing into an airplane because it represents a single point failure”. Meaning that if x fails, there is no backup component. That statement is not meaningful until you assess all three questions above.

I’ve trained a lot of people in use of FMEA’s. The most difficult part of this process is getting yourself to distinguish between each of the three pieces. This is solved by making a list of components and then ONLY dealing with one of those three questions. Only after you have gone through your entire list of components, do you go on to question #2.

Let’s get started. We’ll use a fuel pump failure as an example. Paraphrasing question #1, “My fuel pump just failed, how seriously would that affect the airplane”? Well, if we are airborne, we would loose power instantly. So on a scale of 1 to 10, I’d say this is pretty serious, but I still have flight control. I would rate it an 8 for my airplane since an off field landing in my plane would be dangerous. I would rate the same failure on an ultralight plane as a 5 instead of an 8. Since ultralights can land off field much safer. Remember, for this question, we are only asking “How serious would this effect my airplane if it failed”?  We then go on to the next component. How about spark plug? If a plug has failed, it only reduces my power by 25%. I can still maintain flight. So I would rate a plug failure as a 2. It has little effect on the aircraft. Conversely, a wing spar has a major, instant effect on the airplane. We are going down right now, with no control. Clearly it would be rated as a 10. Very serious effect on airplane. You get the idea. Sounds simple eh?

Only after we have rated the EFFECT of the failure of each component, do we now rate each for question #2: “What is the probability of the component failing”? Often in the production world, the answer is well known. Tests are done and the Mean Time Between Failures (MTBF) is calculated and utilized to answer this question. However, our airplane is a one time deal. We don’t have any history to use. Or do we? Clearly, if a component often fails on the automobile, then we too are at risk in our airplane. I tried a simple way to get unbiased failure frequency information. I merely asked a local Subaru mechanic how often he saw a failed component during this past year. “How many Subaru vehicles last year not running due to fuel pump”? “One or two”. “How about failed spark plugs”? “None”. “How often have you had vehicle shut down due to electronic brain (ECM)”? “Have never seen one fail in my ten years”. I recorded his responses in the spreadsheet.

 Here’s a tough question. How many vehicles is the mechanic exposed to? Don’t get lost in the details. We can make meaningful predictions even if we don’t know the number of vehicles. Figure the average person drives 300 hours a year (12000 miles at 40mph). The mechanic works only on failed vehicles, so I estimated he is exposed to 3000 vehicles (even though he personally only works on 600 a year). That means he is exposed to 300 x 3000 = 900,000 vehicle hours a year. Round it out to 1 million Subaru hours a year. I could be off by 50% in my estimate, doesn’t matter since I treat all components with the same error. Ok, he said two failed fuel pumps a year. 1 million / 2 pumps = fuel pump will fail once each 500,000 hours. Wow.

Do these same calculations for each component. If you take a look at the attached spreadsheet, you’ll see that I adjusted the “Hours per failure” on some components since they aren’t used on all of the vehicle models (less exposure).

Now for the last of the three questions: “What is likelihood I will notice the defective component BEFORE it fails”? Some items fail partially or gradually and you are likely to notice their degradation in preflight. An overheated engine is a good example. The temperature gradually increases and there is a decent chance you will notice before a failure. For this question, if you rate it a “10”, that means you have small chance of noticing it before it fails. A “1” then means you are very likely to notice it before it fails. See the attached spreadsheet.

Now you merely multiply the three ratings you gave to each component. The result is your indicator of which components need your attention. A large value in the “Total risk” column indicates you should focus on improving that particular component. The most important part of this process is to find ways to reduce that “Total risk”. I entered my action items on the spreadsheet. I then recalculated the risk. The column titled “New Total risk” shows that recalculated value.

Conclusions:

The spreadsheet illustrates some important perspectives that normally go unnoticed. Compare the total for the top section of the spreadsheet to the middle section.  Clearly, the risk of an aircraft system failure (2185 total) is much greater than the auto system (849). Auto engines are extremely reliable in comparison to my aircraft. So it’s 3 times more important for me to focus on ways to improve my aircraft system.

Now some of you are going to start looking at the attached spreadsheet and say “ What is that guy thinking? I’m going to break a prop every 700 hours? I’m going to experience vapor lock every 200 hours? He’s crazy”. Many of these numbers are based upon unique characteristics of my airplane. Information and failure history not explained here. Generate your own numbers. But if you are going to use a Subaru engine, don’t talk to a GM mechanic. Also, don’t be influenced by anecdotal information from 1972. The auto industry has made dramatic improvements in reliability even during the past 5 years.

Some readers familiar with FMEA will say: “That’s not how I learned it. You have to break down each component”. Go for it! This approach is simple, yet effective. Just don’t forget that the objective is to take action on the high risk items. And to gain a perspective.

                                                                         New: Spreadsheet that allows you to see

                                                                          how redundancy improves failure risk.

                                                                                        Download spreadsheet

 

 

                                                                           Analysis of Lycoming engine systems

(Note: Lycoming failure rate not apples to apples comparison to Subaru rate …this because Subaru rate in aircraft is unknown.)

Source: Cozy archive search 8/99 thru 8/2004

Lycoming spreadsheetcozy spreadsheet