Using Historical Data to Perform Root Cause Failure Analysis

13 Nov 2023
Failure Analysis is the process of collecting and analyzing data to determine the cause of a failure. It can be difficult to determine the root cause of many machine malfunctions as well as detrimental to the manufacture customer relationship. Read how you can accelerate your Failure Analysis process and improve your overall products...

What is Failure Analysis?

Failure Analysis(FA) is the process of collecting and analyzing data to determine the cause of a failure. This critical process is performed in some way by every company that deploys a product into the field. The results are vital to the quality department of a company in identifying liabilities and correcting malfunctions. Other parts of an organization can also leverage this information. For example, the development department can use the report to improve the product design. The sales group can use it to support warranty claims and the support team can better support the end customer with this data.

 A typical Failure Analysis process involves documenting a failed or failing device that has been returned or is still in the field. These malfunctions are grouped together and then analyzed individually to determine the root cause. Depending on the volume of the product shipped and the number of returns, it is possible that only a portion will be analyzed. A Pareto of failure groups and the quantity of failures in each group is typically created to better organize the information.

Failure Analysis Challenges

Once the information from a Failure Analysis is grouped it can be analyzed to assign priorities. In the example below, the graph displays how many times a malfunction has occurred on an automated bottle filler. Each of the faults would take a significant amount of effort to resolve. The low count defects may not be repeatable which makes the FA very challenging or even impossible due to a lack of information. As a result, if the cost of performing a Failure Analysis exceeds the value generated by identifying the root cause, that specific issue may not be resolved.

Warranty processing is also directly tied to Failure Analysis. The warranty processing team, typically the sales team or field application engineer in a small to mid-sized company, must determine if the defect and associated service cost are covered under the product warranty. Typically warranties exclude failures associated with misuse, which can be difficult to determine.

This also encourages the customer to not be honest with what happened to the device which can add incorrect flaws to your Failure Analysis. Overall these types of disagreements can impact sales and your data.

As a result of the above issues, accelerating the failure analysis (FA) process can improve the overall product, improve the manufacture customer relationship, and reduce the cost of the FA process.

 

Challenges associated with Failure Analysis include:

1.  Time of use

How long was the product used is much more important than the period between the sale of the product and the failure. To collect this information machines should include a powered-up time indicator, as well as the period the machine was activity operating. Machines that have variable speeds should record the amount of time spent operating at each speed.

2.  Conditions during use relative to warranty

Was this device used outside the device specification is always a concern and may not be information the user is willing to share. Leveraging on machine sensors to record the use condition can help resolve this issue with very little ambiguity leading to a simple conversation with the customer. For example, Piezoelectric and Piezoresistive sensors can be used to record all shocks experienced while the machine is powered.

3.  Conditions during use over life cycle

Understanding the use conditions for the full lifecycle of the product can be critical to performing FA when multiple machines have failed in a similar way. Environmental conditions including temperature and humidity, as well as machine sensor data while operating can help identify wear issues.

4. Conditions at time of failure and leading up to failure

The best predictive maintenance and predictive failure algorithms are based on models of know failures. Having as much logged data as possible in the time running up to a failure is critical to developing models for predictive algorithms. 

 

Collecting the above data on every machine shipped will improve the speed of the FA process. In many cases, the sensors are already part of the machine system and enabling this feature only requires the controls engineer to store the information. Additional sensors may be required to develop a complete record of the machine performance.

There are multiple ways to store the data from the machines sensors. First the data can be collected on the machine and extracted only after failure. This approach would require additional storage memory costs due to the very large data files extracted. Alternatively, this information can be stored in the cloud for pennies per gigabyte. If machine data is pushed to the cloud, the machine vendor has the added benefit of being able to access that data at any time, without requiring physical access to the machine.

A Strategy for Deploying a Historical Data Failure Analysis Support System

The first step in deploying a machine data collection system is to take a complete inventory of the data available on the current machine. Many devices in a system have multiple status registers that are not used by the control solution.

For example, most microcontrollers have an embedded temperature sensor. A Modbus-TCP enabled device typically leverages a micro-controller and the value of the temperature sensor is often mapped to a register on that device. This data can be captured by the PLC and pushed to data storage in the cloud.

Devices like the eWON Flexy can be used to transport the data if this function is beyond the capability of the PLC. The Flexy offers data access at almost no incremental cost relative to a Cosy remote access solution. As a result, any machine with a remote access requirement can be upgraded to a remote data access solution for a very little incremental cost. 

Any critical sensors required for FA or warranty resolution not supported by the current system should be added to the system. This can be done by integrating a new sensor into the machine itself leveraging the PLC to collect the data or it can be added as an overlay sensor network that has no impact on the underlying functionality of the machine. The eWON Flexy is a good match for anyone trying to add additional sensors to a machine without disrupting the PLC subsystem.

All sensors on the machine should be monitored and logged overtime during the final production system test. Recording the value of a sensor over time during a known test creates a set time series data which is a signature for a properly operating machine. The signature of each sensor during the test is stored in the FA database.

 

Once a machine has been deployed to the field, any on-site testing signatures should be stored in the same location. Ideally, portions of the onsite field test are identical to portions of the final system test. The signatures can be compared to verify that the shipping and installation process did not impact the machine.

Once the machine is operational, the data from that machine can be streamed to the Failure Analysis database or batch uploaded to the database on a periodic basis. This data can later be used by a data scientist to perform analysis regarding how the machine is being used by each customer. This data is also useful for creating a predictive maintenance model.

When machine failures occur, they are segmented into two major buckets; those that require service (broken part) and those that can be resolved by the machine operator (jams or positioning issues). In both cases, stored time series operation data can be used to help determine the cause of the malfunction.

This data is particularly useful when the flaw is intermittent and resolvable by the machine operator. Without this data service personnel must witness a failure, which can take a significant amount of time for an issue that occurs infrequently.

When a machine is failing on-site or has been returned to the factory, the original production system tests can be executed on the machine. Comparing sensor signatures from prior testing can quickly shed light on the failure mechanism.

In summary, a well-documented factory and onsite testing process, combined with machine sensor signatures (time series machine data), can simplify failure analysis and help development teams improve their machine's performance over time.

LEARN MORE ABOUT REMOTE DATA WITH eWON