\section{Toward an automated model-based approach to system validation} \label{sec:intro} The increasing complexity of interactions between functions and other equipment in modern industrial systems poses new technical challenges. In fact, developing complex systems often raise integration problems during the product final testing and verification phase. Besides, correcting these issues often generates a heavy rework and is a well-known cause for cost overruns and project delays. We have to say that operation of complex systems is traditionally informally expressed in design documents as a set of modes representing functions, equipments and monitoring mechanisms, which are validated by review, i.e. exhaustive manual (opposed to automatic) consistence checks and cross-reading. Human validation activities soon became practically impossible due to the high number of combinations to be checked. Therefore, finding and applying powerful computer-aided analysis techniques that contribute to anticipate and resolve integration problems as early as possible in the design process has become a prior concern for the industry. We believe that formalizing and validating the specifications through animation/simulation and model-checking has several strong advantages wrt ``traditional approaches''. On the one hand, modelling during the specification phase forces the designer to formalise and clarify the specifications. Animation/simulation is useful for validating the model against the specifications and for identifying behaviour inconsistencies based on relevant user-defined scenarios. Such inconsistencies are difficult to identify in a classical purely paper-based specification process. Last, formal verification proves that none of the possible execution scenarios violates the system properties. Such approaches are useful not only for validating an architecture or failure detection, isolation and reconfiguration strategies once defined, but also for tuning its parameters during the definition phase. It is worth noting that the approach and tools described in these pages, while embracing the automated validation of properties over model-described complex systems via model checking, also extends the existing tools to permit the analysis of temporal properties. In fact, in order to have a complete vision and control over a system, it is necessary to be able to grasp transient states, and system dynamics. \subsection{Classical safety analysis techniques: FTA and FMEA} Fault Tree Analysis (FTA)~\cite{stamatelatos2002fault} and Failure Mode Effects Analysis (FMEA)~\cite{FMEA} are well-known and widely used system analysis techniques used in reliability engineering. Both are long established -- FMEA was formally introduced in the late 1940s, and FTA has been around since the 1960s -- and both have been employed in a number of different areas, including the aerospace, nuclear power, and automotive industries. They are methods that we can use to identify potential faults in a system, so that we can then use that information to correct or prevent those faults. Fault Tree Analysis is equally applicable to quantitative and qualitative analyses, and easy to use and understand. Fault trees themselves are graphical representations of logical combinations of failures, and show the relationship between a failure or fault and the events that cause them: they normally consist of a top event, which is typically a system failure, connected to one or more basic events via a system of logical gates, such as AND and OR. Basic events are usually either component failures or events expected to happen as part of the normal operation of the system. Analysis of the fault tree consists of two parts: qualitative (logical) analysis, and quantitative (probabilistic) analysis. Qualitative analysis is performed by reducing the logical expression represented by the fault tree into a set of minimal cut sets, which are the smallest possible combinations of failures required to cause the top event. Quantitative analysis is performed by calculating the probability of the top event given the probability of each of the basic events occurring. In an FMEA, the basic process consists of compiling lists of possible component failure modes (all the ways in which an entity may fail), gathered from descriptions of each part of the system, and then trying to infer the effects of those failures on the rest of the system. Usually, these effects are evaluated according to a number of criteria, such as severity, probability, and detectability, and often these criteria are then combined into an overall estimate of risk. All of this data is then presented in the form of a table which allows the analyst see what the effects of each failure mode are. Even if useful, the usage of those manual techniques is hindered by the increasing complexity of systems: it is becoming more and more difficult to incorporate FTA or FMEA in the design cycle of systems that are more than relatively simple and manageable. In complex systems, however, manual analysis is laborious and error prone, and a thorough assessment and interpretation of the results becomes increasingly difficult to achieve within the constraints of most projects. Furthermore, the results of the analyses are separated from the design being analysed, meaning that the effects of any changes in the system design may only become apparent after another long and costly analysis~\cite{papadopoulos2011HH}. Complex systems yield a need for specific supporting measures and tools to assist in the application of analysis techniques. One obvious approach would be to automate at least part of the process. This would mean that the analyses could be carried out more quickly and efficiently, leaving more time for the results to be studied and allowing more useful conclusions to be drawn. \subsection{Modern safety analysis techniques: MBSE and MBSA} New modelling techniques, such as MBSE or MBSA, propose to master the combinatorial complexity at early concept phases by using abstract high level representations of a system~\cite{Stamatis}. These views constitute a promising ground to implement early validation techniques of the architectures. But, in order to be profitable and implemented by the industry, those validation techniques must remain lightweight and well integrated in the system design process. That is to say, the modelling workload must be limited, and the analysis results (even preliminary) must be available at the same time as designers evaluate the possible architecture choices. These requirements have driven our research and the toolchain described in those pages. As written above, modern safety analysis techniques permit a wealth of information about the safety and reliability of a system to be gathered much more quickly and easily than ever before. This information gives to the designers the means to use safety and reliability as a major factor in the decisions they make during the evolution of a system design: by evaluating the effects of one or more potential design choices, e.g. increased reliability at the expense of greater cost or increased weight, etc., designers are able to make informed choices. However, just as classical manual safety analyses restrict the rate that information can be obtained about a system, manually evaluating different design choices is time-consuming and restricts the number of design candidates that can be investigated. If this process could be automated, it would be possible to examine hundreds or thousands of potential design candidates -- a much greater portion of the total design space -- and thus hopefully provide a better foundation for the next iteration of the system design. \section{Automated Evaluation of Safety Temporal Conditions} In this report, we describe an approach that allows us to extend models developed for safety analysis in order to reason about the correctness of temporal conditions. We intend to offer the capability to study a new range of system requirements that can be of main interest for functions such as failure detection, isolation and recovery. We advocate that timing properties are critical when assessing the safety of embedded and real-time systems. Indeed, temporal aspects---like network delays or computation times---can be the cause of missed failure detections or undesired reactions to (delayed) failure propagation. It is therefore necessary to be able to analyse the temporal properties of a model in order to build systems that will operate as intended in a real-world environment. We define a model-based process to check simultaneously safety and temporal conditions on systems. Our approach is based on an extension of the AltaRica language~\cite{arn00} where timing constraints can be associated with events. This extension can then be translated into the intermediate language Fiacre~\cite{berthomieu2008fiacre}, a formal specification language that can be used to represent both the behavioural and timing aspects of systems. This Fiacre model can be analysed with the realtime model-checker Tina~\cite{BRV04}. The results of model-checking shed light on the dysfunctional behaviour of the original model, including how the cascading effects due to failure propagation delay reveal transitory failure modes~\cite{albore2017IMBSA}. The approach scales up on models of industrial scale, like in the validation of formation flying satellite systems where complex equipment and instruments are distributed over several spacecrafts~\cite{albore2018ERTS}. The translation developed takes the functional part of the system model and the dysfunctional viewpoint modelled by safety engineers to generate an AltaRica model of the system. The generated AltaRica model is formal and allows, one from another, the dysfunctional simulation of the system and the generation of sequences of events leading to accidents.\\ %%%% This report is organised as follows. We start by giving a wider vision about the subject by revising the state of the art~\ref{soa}. We then define the AltaRica and the Fiacre language and the time model in Chap.~\ref{altarica}, providing examples of the encoding of AltaRica in Fiacre, and giving some experimental results about time failure propagation in Sect.~\ref{sec:sect4}. In Chap.~\ref{aocs} we apply the described approach to an industrial case of (automated) verification on AOCS modes in satellites, and discuss scalability issues and empirical results. %%% Local Variables: %%% mode: latex %%% TeX-master: "main" %%% End: %% LocalWords: Fiacre