\section{Toward an automated model-based approach to system validation}
\label{sec:intro}


The increasing complexity of interactions between functions and other equipment in modern
industrial systems poses new technical challenges. 
In fact, developing complex systems often raise integration problems during the product
final testing and verification phase. Besides, correcting these issues
often generates a heavy rework and is a well-known cause for cost
overruns and project delays.

We have to say that operation of complex systems is traditionally informally expressed in design documents as a set of modes representing functions, equipments and monitoring mechanisms, which are validated by review, i.e. exhaustive manual (opposed to automatic) consistence checks and cross-reading. 
Human validation activities soon became practically impossible due to the high number of combinations to be checked. 
  Therefore, finding and applying powerful computer-aided analysis techniques  that
 contribute to anticipate and resolve integration problems as early
 as possible in the design process has become a prior concern for  the industry. 
 

We believe that formalizing and validating the specifications through animation/simulation and model-checking has several strong advantages
wrt  ``traditional approaches''. On the one hand, modelling during the specification phase forces the designer to formalise and clarify the specifications. Animation/simulation is useful for validating the model against the specifications and for identifying behaviour inconsistencies based on relevant user-defined scenarios. Such inconsistencies are difficult to identify in a classical purely paper-based specification process. Last, formal verification proves that none of the possible execution scenarios violates the system properties. Such approaches are useful not only for validating an architecture or failure detection, isolation and reconfiguration strategies once defined, but also for tuning its parameters during the definition phase.

It is worth noting that the approach and tools described in these pages, while embracing the automated validation of properties over model-described complex systems via model checking, also extends the existing tools to permit the analysis of temporal properties. In fact, 
in order to have a complete vision and control over a system, it is necessary to be able to grasp transient states, and system dynamics.


\subsection{Classical safety analysis techniques: FTA and FMEA}
Fault Tree Analysis (FTA)~\cite{stamatelatos2002fault} and Failure Mode Effects Analysis (FMEA)~\cite{FMEA} are well-known and widely used system analysis techniques used in reliability
engineering. Both are long established -- FMEA was formally introduced in the late 1940s, and FTA has been around since the
1960s -- and both have been employed in a number of different areas, including the aerospace, nuclear power, and automotive industries. 
They are methods that we can use to identify potential faults in a system, so that we can then use that information to correct or prevent those faults.

Fault Tree Analysis is equally applicable to quantitative and qualitative analyses, and easy to
use and understand. Fault trees themselves are graphical representations of logical combinations of failures, and show the
relationship between a failure or fault and the events that cause them: they normally consist of a top event, which is
typically a system failure, connected to one or more basic events via a system of logical gates, such as AND and OR. Basic
events are usually either component failures or events expected to happen as part of the normal operation of the system.
Analysis of the fault tree consists of two parts: qualitative (logical) analysis, and quantitative (probabilistic) analysis. 
Qualitative analysis is performed by reducing the logical expression represented by the fault tree into a set of minimal cut sets,
which are the smallest possible combinations of failures required to cause the top event. Quantitative analysis is performed
by calculating the probability of the top event given the probability of each of the basic events occurring.

In an FMEA, the basic process consists of compiling lists of possible component failure modes (all the ways in which an
entity may fail), gathered from descriptions of each part of the system, and then trying to infer the effects of those failures
on the rest of the system. Usually, these effects are evaluated according to a number of criteria, such as severity, probability,
and detectability, and often these criteria are then combined into an overall estimate of risk. All of this data is then presented
in the form of a table which allows the analyst see what the effects of each failure mode are.

Even if useful, the usage of those manual techniques is hindered by the increasing complexity of systems: it is becoming more and more difficult to 
incorporate  FTA or FMEA in the design cycle of systems that are more than relatively simple and manageable.
In complex systems, however, manual analysis is laborious and error prone, and a thorough assessment and interpretation of the results becomes increasingly
difficult to achieve within the constraints of most projects. Furthermore, the results of the analyses are separated from the
design being analysed, meaning that the effects of any changes in the system design may only become apparent after another
long and costly analysis~\cite{papadopoulos2011HH}.

Complex systems yield a need for specific supporting measures and tools to
assist in the application of analysis techniques. One obvious approach would be to automate at least part of the process. This
would mean that the analyses could be carried out more quickly and efficiently, leaving more time for the results to be studied and allowing more useful conclusions to be drawn.

\subsection{Modern safety analysis techniques: MBSE and MBSA}


New modelling techniques, such as MBSE or MBSA, propose to master the
combinatorial complexity at early concept phases by using abstract
high level representations of a system~\cite{Stamatis}. These views constitute a
promising ground to implement early validation techniques of the
architectures. But, in order to be profitable and implemented by the
industry, those validation techniques must remain lightweight and well
integrated in the system design process. That is to say, the modelling
workload must be limited, and the analysis results (even preliminary)
must be available at the same time as designers evaluate the
possible architecture choices. These requirements have driven our research and the toolchain described in those pages.

As written above, modern safety analysis techniques permit a wealth of information about the safety and reliability of a
system to be gathered much more quickly and easily than ever before. This information gives to the designers the means to use safety and
reliability as a major factor in the decisions they make during the evolution of a system design: by evaluating the effects of
one or more potential design choices, e.g. increased reliability at the expense of greater cost or increased weight, etc., designers are able to make informed choices.
However, just as classical manual safety analyses restrict the rate that information can be obtained about a system, manually evaluating different design choices is time-consuming and restricts the number of design candidates that can be investigated. If this process could be automated, it would be possible to examine hundreds or thousands of potential design
candidates -- a much greater portion of the total design space -- and thus hopefully provide a better foundation for the next
iteration of the system design.


\section{Automated Evaluation of Safety Temporal Conditions}
In this report, we describe an approach that allows us to extend models
developed for safety analysis in order to reason about the correctness
of temporal conditions. 
We intend to offer the capability to study a
new range of system requirements that can be of main interest for
functions such as failure detection, isolation and recovery.  We
advocate that timing properties are critical when assessing the safety
of embedded and real-time systems. Indeed, temporal aspects---like
network delays or computation times---can be the cause of missed
failure detections or undesired reactions to (delayed) failure
propagation. It is therefore necessary to be able to analyse the
temporal properties of a model in order to build systems that will
operate as intended in a real-world environment.


We define a model-based process to check simultaneously safety and temporal
conditions on systems. Our approach is based on an extension of the
AltaRica language~\cite{arn00} where timing constraints can be
associated with events. This extension can then be translated into the
intermediate language Fiacre~\cite{berthomieu2008fiacre}, a formal
specification language that can be used to represent both the
behavioural and timing aspects of systems. This Fiacre model can be
analysed with the realtime model-checker Tina~\cite{BRV04}. The
results of model-checking shed light on the dysfunctional behaviour of
the original model, including how the cascading effects due to failure
propagation delay reveal transitory failure modes~\cite{albore2017IMBSA}.
The approach scales up on models of industrial scale, like in the validation
of formation flying satellite systems where complex equipment and instruments are distributed over several spacecrafts~\cite{albore2018ERTS}.
  
The translation developed takes the functional part of the system model and the dysfunctional viewpoint modelled by  safety engineers to generate an AltaRica model of the system. The generated AltaRica model is formal and allows, one from another, the dysfunctional simulation of the system and the generation of sequences of events leading to accidents.\\


%%%%

This report is organised as follows. We start by giving a wider vision about the subject
by revising the state of the art~\ref{soa}. We then define the AltaRica and the Fiacre
language and the time model in Chap.~\ref{altarica}, providing examples of the encoding of  AltaRica in Fiacre,
and giving some experimental results about time failure propagation
in Sect.~\ref{sec:sect4}.
In Chap.~\ref{aocs} we apply the described approach to an industrial case of (automated) verification
on AOCS modes in satellites, and discuss scalability issues and empirical results.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:

%%  LocalWords:  Fiacre