Failure modes and effects analysis (FMEA)
Failure modes and effects analysis (FMEA) is a procedure for analysis of potential failure modes within a system for the classification by severity or determination of the failures' effect upon the system. It is widely used in the manufacturing industries in various phases of the product life cycle and is now increasingly finding use in the service industry as well. Failure causes are any errors or defects in process, design, or item especially ones that affect the customer, and can be potential or actual. Effects analysis refers to studying the consequences of those failures.
Step 1: Severity
Determine all failure modes based on the functional requirements and their effects. Examples of failure modes are: Electrical short-circuiting, corrosion or deformation. It is important to note that a failure mode in one component can lead to a failure mode in another component. Therefore each failure mode should be listed in technical terms and for function. Hereafter the ultimate effect of each failure mode needs to be considered.
A failure effect is defined as the result of a failure mode on the function of the system as perceived by the user. In this way it is convenient to write these effects down in terms of what the user might see or experience. Examples of failure effects are: degraded performance, noise or even injury to a user.
Each effect is given a severity number(S) from 1(no danger) to 10(important). These numbers help an engineer to prioritize. If the severity of an effect has a number 9 or 10, actions are considered to change the design by eliminating the failure mode, if possible, or protecting the user from the effect. A severity rating of 9 or 10 is generally
reserved for those effects which would cause injury to a user or otherwise result in litigation.
Step 2: Occurrence
In this step it is necessary to look at the cause of a failure and how many times it occurs. This can be done by looking at similar products or processes and the failures that have been documented for them. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented. Again this should be in technical terms. Examples of causes are: erroneous algorithms, excessive voltage or improper operating conditions.
A failure mode is given a probability number(O),again 1-10. Actions need to be determined if the occurrence is high (meaning >4 for non safety failure modes and >1 when the severity-number from step 1 is 9 or 10). This step is called the detailed development section of the FMEA process.
Step 3: Detection
When appropriate actions are determined, it is necessary to test their efficiency. Also a design verification is needed. The proper inspection methods need to be chosen. First, an engineer should look at the current controls of the system, that prevent failure modes from occurring or which detect the failure before it reaches the customer.
Hereafter one should identify testing, analysis, monitoring and other techniques that can be or have been used on similar systems to detect failures. From these controls an engineer can learn how likely it is for a failure to be identified or detected. Each combination from the previous 2 steps, receives a detection number(D). This number represents the ability of planned tests and inspections at removing defects or detecting failure modes. After these 3 basic steps, Risk Priority Numbers (RPN) are calculated.
Risk Priority Numbers RPN do not play an important part in the choice of an action against failure modes. They are more threshold values in the evaluation of these actions. After ranking the severity, occurrence and detectability the RPN can be easily calculated by multiplying these 3 numbers: RPN = S x O x D This has to be done for the entire process and/or design. Once this is done it is easy to determine the areas of greatest concern. The failure modes that have the highest RPN should be given the highest priority for corrective action. This means it is not always the failure modes with the highest severity numbers that should be treated first. There could be less severe failures, but which occur more often and are less detectable.