Методология проведения анализа на отказ сложных технических систем и предотвращение их последствий

Aleksey G. Amosov; Vladislav A. Golikov; Ekaterina V. Mikhailova; Oleg V. Rozhdestv

doi:https://doi.org/10.34069/AI/2021.48.12.5

DOI:https://doi.org/10.34069/AI/2021.48.12.5

Methodology for failure analysis of complex technical systems and prevention of their consequences

Методология проведения анализа на отказ сложных технических систем и предотвращение их последствий

Aleksey G. Amosov¹³

, Vladislav A. Golikov¹⁴

, Ekaterina V. Mikhailova¹⁵

, Oleg V. Rozhdestvensky¹⁶

Abstract

The paper presents a study on the methodology of failures and their possible consequences analysis. Analysis of failures and their consequences is carried out for newly developed or modernized products and it is one of main activities in the reliability assurance system. The methodology is applied to the analysis of all designed systems, starting from the earliest stage of development, in order to evaluate the approach to development and compare the advantages of the design solution. The considered analysis of failures and their consequences of components is a part of the complex analysis of reliability of the whole product. Depending on the complexity of the design and the available data, a particular approach may be chosen for the analysis. In one case, it is a structural approach, in which a list of individual elements and their possible failures is compiled. In another case, it is the functional approach, which is based on the statement that each element must perform a number of functions that can be classified as solutions. The results provide a scheme for conducting the analysis and finding solutions to prevent them. The conclusions say that the level of detail determines the level at which failures are postulated.

Keywords: analysis, probability, failure, consequences, performance, complex technical system.

Аннотация

В работе представлено исследование на предмет методологии проведения анализа отказов и их вероятных последствий. Анализ отказов и их последствий проводится для вновь разрабатываемых или модернизируемых изделий и является одним из главных мероприятий в системе обеспечения надежности. Методика применяется для анализов всех проектируемых систем, начиная с самой ранней стадии разработки, с целью оценки подхода к разработке и сравнению преимуществ того или иного проектного решения Рассматриваемый анализ отказов и их последствий составных частей является частью комплексного анализа надежности всего продукта целиком. В зависимости от сложности конструкции и имеющихся данных может быть выбран определенный подход для проведения анализа. В одном случае это структурный подход, при котором составляется перечень отдельных элементов и их возможных отказов. В другом случае – функциональный подход, в основу которого положено утверждение, что каждый элемент должен выполнять ряд функций, которые могут быть классифицированы как решения. В результатах приведена схема проведения анализа и поиска решений по их предотвращению. В выводах сказано о том, что уровень детализации определяет уровень, на котором постулируются отказы.

Ключевые слова: анализ, вероятность, отказ, последствия, работоспособность, сложная техническая система.

Introduction

Historically, the analysis of the consequences of complex technical systems was applied for the first time in the 1950s and was used for aviation and space technology, later it began to be used for military technology (Military Standard MIL-STD-1629A-1984, 1980). Since 1980, the technique has been used in the automotive industry at Ford factories. Currently, similar analyzes are carried out in all industries and this process is an integral part of quality management and is used internally and externally, for example, as a condition for the supply of components.

Failure analysis of complex technical systems is a technique for assessing the reliability and safety of systems; it is a method of systematic analysis of a system to identify the types of potential failures, their causes and consequences, as well as the impact of failures on the functioning as a whole or its components and processes. Similar analyzes become more popular and have become an important part of many development processes over the years. After the analysis, it can be concluded that the consequences of the failure of one or another component are much more serious than anticipated.

The term “system” is used to describe hardware, software or process. It is recommended to conduct the analysis early in the development phase when it is most cost effective to eliminate or mitigate the impact. The analysis can be started as soon as the system can be represented in the form of a functional block diagram with an indication of its elements.

The grounds for using the methodology for analyzing complex technical systems may be as follows:

identification of failures that have undesirable consequences for the functioning of the system, such as interruption or significant degradation of performance or impact on the safety of the user;
fulfillment of the requirements specified in the contract;
improving the reliability or safety of the system (for example, through design changes or quality assurance actions);
improving the maintainability of the system by identifying areas of risk or nonconformity with respect to maintainability.

In accordance with the above, the objectives of the methodology application and the analysis may be as follows:

complete identification and assessment of all undesirable consequences within established system boundaries and sequences of events caused by each identified common cause failure mode at various levels of the functional structure of the system;
determining the criticality or priority for diagnostics and mitigation of the negative consequences of each type of failure affecting the correct functioning and parameters of the system or the corresponding process;
classification of the identified failure modes according to characteristics such as ease of detection, diagnostic capability, testability, operating and repair conditions (repair, operation, logistics, etc.);
identifying functional system failures and assessing the severity and likelihood of failure;
developing a plan to improve the design by reducing the number and consequences of failure modes.

The analysis is carried out to identify all possible failures and to reduce their impact on performance by selecting appropriate circuit solutions or by taking appropriate measures to prevent these failures.

The results of the analysis are used:

to determine the need for changes in products and to assess their impact on system reliability;
to draw up a typical list of possible system failures, which provides decision-making during the design;
to determine the effect of possible failure types on other systems;
to determine the completeness of preventive measures aimed at elimination of failures during operation of the systems;
as additional information for designers and operating services about the behavior of a product in the event of system failures;
to provide development management process related to decision making;
for objective planning of the scope and types of operational testing;
to provide better operational analysis;
to provide probabilistic analysis of subsystems and products as a whole.

The analysis of possible failures and their consequences, as an integral procedure in the product development process, should reflect and account for all changes in the accompanying documentation (Military Standard MIL-STD-1629A-1984, 1980; Alexandrova et al., 2020).

The work presents “Literature review”, which provides the theoretical basis of the research, and “Methodology for analyzing failures and their consequences”, which shows a set of analyzed data and provides a mathematical definition of failure in terms of the functioning parameters and the permissible limits of the vector of design parameters X*. In the next section, “The result of the analysis of failures and their consequences”, the result of the study is given, namely, the scheme for the analysis of failures and their consequences (Figure 1) is designed, which graphically demonstrates the method of conducting the analysis “top-down”, the use of which provides greater flexibility and makes it possible to limit the analysis at any level. “Discussion” explains the meaning and implications of the research result, it stipulates that failure modes for each agreed-upon level analyzed should be identified and described. Also in this section the typical symptoms of failures necessary to confirm the full completion of the analysis are given and the measures taken to prevent each possible type of failure are classified. The conclusion describes a list of information for each considered element of the system, compiled according to the results of the analysis with the block diagram in the Figure 1.

Literature Review

The considered topic in the modern world has a very high degree of relevance. High-tech structures of various kinds and purposes are classified as complex systems and are characterized by a multi-level hierarchical structure. Since a complex of factors acts on the systems during operation, the reliability of the system's functioning also depends on the optimal distribution of the values of reliability indicators between the elements of the system. A large number of studies have been carried out on this issue, in (Grishko, 2016), ratios are proposed for assessing the requirements for the reliability of subsystems, taking into account only gradual changes in parameters and simultaneously taking into account sudden and gradual failures. The developed methodology for determining the requirements in the study (Severtsev, 2013) is supposed to be applied at the early stages of design, when the lack of the necessary information makes it impossible to find the standards of reliability of structural elements that ensure the minimum total cost of the project. From the point of view of the value of information, metric properties characterize the informativeness of the physical values of the parameters, which is expressed through the target value, reflecting the relation of system parameters with the internal parameters of the object, which is described in detail in the article (Sadikhov et al., 2015). The study of structural stability and safety indicators of the functioning of nonlinear dynamic systems is given in (Kochegarov, 2012), they are described by nonlinear differential equations with frequency derivatives reducible by Fourier-Laplace transform. When considering the results of work (Katulev et al., 2016), we can see that the results obtained are fully consistent with the results of (Gilmore, 1981), where a potential function was used to describe the technical system, which was subsequently not used in the proposed algorithm.

The disadvantages of all of the above works are the lack of a procedure for analyzing the probability of failures and their probable consequences.

Methodology

The methodology is used to analyze all designed systems, starting from the earliest stage of development, in order to evaluate the development approach and compare the advantages of a particular design solution. At this stage and at the stage of providing of analysis there is an opportunity to timely identify the most obvious types of failures, the impact of which can be reduced with minimal modifications. As the product is developed, the analysis deepens to significantly lower levels. When the necessary changes are made to the system to eliminate or reduce the impact of the detected failure type, the analysis must be repeated to ensure that all predicted failure types are addressed in the new concept of the modified system.

The analysis of failures and their consequences is an integral part of a comprehensive reliability analysis.

The analysis of possible failures and their consequences is carried out in accordance with the general requirements of this methodology. Failure analysis work starts from the initial design stages and ends only after the product is transferred to serial production. This is used to reflect changes in the design, and the results of the analysis to guide the development (Lebedeva et al., 2018).

Main rules and assumptions must be developed and used before analysis. These rules disclose the subject matter of the analysis (i.e., what is to be analyzed: system, function or combination of them), low level of the system, definition of object failure in terms of performance criteria and allowable limitations.

The methodological assumptions used in the analysis may be as follows:

all elements are designed and manufactured in compliance with all requirements of design and technical documentation, and the structure assembled from such elements will operate normally;
simultaneous occurrence of independent failures of two or more elements is not considered in the analysis;
failure of an element, irrespective of if it is caused by reasons in the element, in the drive of this element, by loss of electric power or by loss of input (output) signals, is considered as a failure of the analyzed element;
consequences of a failure during some stage of functioning, assuming that the product was serviceable before the start of this stage;
if any function of an element is not required within some stage of functioning, and this function could not be fulfilled due to failure, the failure is characterized as not affecting the operability (Baillieul & Samad, 2015).
The analysis develops a definition of failure in terms of functioning parameters and allowable limits .

,

where X is the vector of design parameters:
- type, purpose and structure;
- technical requirements;
- principles of operation;
- design;
- operating conditions;
- results of tests and operation of analogues;
- other factors determining the features of the considering product.

Results

Analysis should begin as an integral part of the design process for the functional parts of systems early in order to account for design changes. Ongoing analysis should be an integral part of any design review from the beginning to the end of the design (Polynskaya & Enderyukova, 2015; Shibaev et al., 2019).

Depending on the complexity of the design and the data available, a particular approach may be chosen for the analysis. In one case, it is a structural approach, in which a list of individual elements and their possible failures is compiled. In another case, it is the functional approach, which is based on the statement that each element must perform a number of functions that can be classified as solutions. For the most complex systems, such as control systems, measuring equipment, etc., a combination of structural and functional analysis should be used. Both methods and their combination should start either from the top hierarchical level (top-down method) or start from the lower level of parts and elements and end with the analysis of the whole system (bottom-up method). In case the object of design is a large and complex technical system that has a developed hierarchical structure. In accordance with the system approach, which is the basis of the computer-aided design method, when solving problems of a certain hierarchical level there is no need to develop models of the whole system hierarchy and to obtain an acceptable result it is enough to consider systems two orders of magnitude lower or higher. To determine the parameters and characteristics it is necessary to develop a model of each element at the i-th level, with the models of the upper levels included in the models of the lower levels i+2. At each hierarchical level, the model represents relations (expressed by equations) describing the dependencies between parameters and characteristics (Lontsikh & Boryushkina, 2010; Kuravsky et al., 2020).

Bottom-up analysis is appropriate at an early stage of design and documentation development, when the amount of information about the product being designed is sufficient to perform a qualitative analysis.

The main principle of this method is to consistently ask the question: “What event leads to the failures of the elements of the i-th level?”.

The top-down method of analysis provides more flexibility, because it has the ability to limit the analysis at any level, as well as more in-depth elaboration of individual critical types of failures.
The basic principle of this method is to consistently ask the question: “How and for what reason can a failure occur?”.

The structural approach is used when elements can be unambiguously identified by identifying them from diagrams, drawings and other technical documents. This approach is typically used in bottom-up analysis (Bubnicki, 2005).

The functional approach is used when the elements of the system cannot be uniquely identified or when the system analysis is done in a top-down manner.

The order and sequence of the analysis is shown in the Figure 1.

Figure 1. Scheme of the analysis of failures and their consequences.

Each failure of an element of the system, the effect of which is analyzed, should be considered as a single failure in the system. When a primary element failure cannot be detected, the analysis should continue until the effect of a secondary failure, which in combination with a primary undetected failure can lead to a critical situation, is determined. Multiple failures in systems leading to critical situations should also be identified. All failures identified in the analysis should be summarized to ensure the visibility of these failures (Boran-Keshishyan, 2013; Smirnov, 2019).

The first step in performing the analysis is to define the system being analyzed. The definition of the system includes:

brief description of the purpose and operation of the system;
drawing the interface diagram of the analyzed system with other systems;
definition of the stages of the system functioning;
functional description of each stage of functioning;
coding of stages and functions.

Pairing scheme of the analyzed system with other systems is made to account for possible failures of inter-system links (Shamraeva, 2018; Kuravsky & Yuryev, 2020).

The links can be the following:

energetic (through power source: electrical, hydraulic, pneumatic, electric power, etc.);
structural (i.e. the possibility of mechanical, electrical, installation and layout connections of different systems in the same areas);
external (electromagnetic interference, electrical processes, influence of various external factors).

To illustrate the operation, interaction and interdependence between the constituent parts of the system and to enable a consistent analysis of the impact of failure at all specified levels of the system, it is necessary to build block diagrams of system operation.

Depending on the approach used, functional block diagrams or structural reliability diagrams can be constructed.

All inputs and outputs of an element of the system in the block diagram must be shown and clearly marked, i.e. each block must be assigned its own number, which reflects the order of loss of function by the system.

Depending on the objectives of the system, more than one flowchart may be required to describe alternative modes of operation of the system.

A functional block diagram illustrates the operation and interaction between the elements of the system as established in the technical documentation. A functional block diagram illustrates the sequence of functions performed by the system and each of the specified levels under analysis, and can be used for structural and functional analysis.

A reliability block diagram defines the sequential dependence of all functions of a system or functional group for each event from system start to end. The reliability block diagram provides an opportunity to identify the interdependence of functions for a system and can be used for the functional method (Carayannis & Coleman, 2005; Lesin et al., 2012).

Discussion

All foreseeable failure modes for each stipulated level to be analyzed must be defined and described. Potential types of failures are determined by examining the outputs of the functions designated in the block diagrams and schematics. Failure types of individual element functions are postulated based on the requirements established in the system description and the failure definitions included in the ground rules. Since a failure type can have more than one cause, all probable independent causes for each type must be defined and described. Causes of failures within an adjacent stipulated level must be considered. For example, causes of failures at the third level of the system are considered in the analysis of the second level. To ensure that the analysis is fully complete, each type of failure and output function must, at a minimum, be explained according to the following typical failure attributes:

premature actuation;
non-activation or delayed actuation;
complete or partial loss of designated functions;
set point deviation from set point limits;
occurrence of processes which prevent operation;
other signs of failures, if they exist, caused by characteristics of the system or conditions of its functioning.

The consequences of failures are characterized by the effect that each predicted and described type of failure has on the operation, function or state of the object of analysis (Astrom & Kumar, 2014; Meacham & van Straalen, 2018).

All phenomena, processes, events and states due to the occurrence of each predicted type of failure must be described in failure and probable consequences analysis worksheet.

Failure consequence analysis focuses on the block diagram element whose operability is affected by the type of considered failure. The type of failure may affect several levels of the system other than the one under consideration, hence “local” effect of the failure type on “next upper level” and “final” effect on the whole system under analysis should be determined.

Determination of the effect of the failure type on the “next upper level” is carried out in order to assess the consequences to which the investigated failure type leads for the next upper level in relation to the system level under consideration.

For each predicted type of failure, the measures that are taken to prevent it should be described.

These include:

design measures that entail changes in technical documentation, schematics at any specified level and compensate for the effects of the failure, such as: introduction of control switching elements that contain the occurrence or propagation of the failure or the introduction of redundancy into the system;
worked out, associated with the planning of unique tests aimed at determining the operability of system designs to identify the possibility of primary and secondary types of failures described for a given design or system.

Consequence severity classification is performed to determine a qualitative measure of the potential consequences of design errors or element failure (McCulloch et al., 2009; Albertos & Mareels, 2010).

The impact on the performance conditions of the analyzed element, caused by loss or degradation of output due to failure, is classified according to the following 4 degrees of consequence severity:

consequences of failures that can lead to complete failure of product performance;
consequences of failures which can lead to a complete cessation of task performance;
consequences of failures which can cause minor damage to the system and which can either be detected in time and lead to cancellation of work or lead to a decrease in efficiency;
consequences of failures that do not cause a system failure.

In the event that it is not possible to determine the severity of the consequences of an element failure in accordance with the above classification, similar provisions based on the loss of system input or output should be developed and included in the basic rules of analysis (Kapitonov, 2021a); Kapitonov, 2021b).

Conclusion

As we can see, the level of detail determines the level at which failures are postulated. The analysis can be performed at different levels of the system down to the details, depending on the available information and development requirements. The lower the stipulated level and the higher the level of detail, the more types of failures are considered. Less detailed analysis done in a timely manner is more valuable for reliability than more detailed analysis done late.

The analysis should be used to identify items with a high risk of failure and take steps to prevent it. The analysis can also be used to identify special tests, failure monitoring and detection activities, performance limitations and other information and activities necessary to minimize the risk of failure. All recommended actions resulting from the study should subsequently be included in the appropriate documentation, or a document should be issued that provides justification for not taking any action. The list resulting from the analysis will include the following information for each element under consideration:

name of the element and its relation to other elements;
description of design characteristics that minimize the possibility of failure of considered element;
description of tests performed to confirm the design characteristics and the tests planned at product acceptance or during inspections that could detect failure of the element;
description of planned inspections to verify that the product is built in accordance with the requirements of the accompanying documentation;
description of development and testing history of the design or counterpart;
description of measures to be taken to prevent related failures.

The overall economic benefit from the application of the methodology can be as follows: decrease in the number of changes made at the production stage and the cost of making changes, as well as the elimination of errors and related defects, and therefore, it will save from complaints, lawsuits and significant costs to eliminate defects.