Model Types Used to Study Availability

We will briefly describe the model types used to compute the reliability and the availability of a system.

  1. Combinatorial Model Types
  2. State-space Models
  3. Hierarchical Models
  1. Combinatorial Model Types

    The three model types that are commonly used for reliability and availability are reliability block diagrams, reliability graphs and fault trees. These model types are similar in that they capture conditions that make a system fail in terms of the structural relationships between the system components. A comparison of the power of these model types has been made in this paper.

    • Reliability block diagrams

      A series-parallel reliability block diagram represents the logical structure of a system with regard to how the reliability of its components affects the system reliability. In a block diagram model, components are combined into blocks in series, in parallel or in k-out-of-n configurations. All of these constructs can be used together in a single block diagram. Components of the same type that appear more than once in the system structure are assumed to be copies with independent, identical distribution functions. Each component may have a failure probability, a failure rate, a failure distribution function or unavailability attached to it.

      The assumption of independence and series-parallel structure allows very fast computation of reliability and availability measures. However, many system models in practice do not follow series-parallel structure. The SHARPE software package allows easy specification and solution of such models

      Example (Christophe)

    • Reliability graphs

      The reliability graph model consists of a set of nodes and edges (and directed arcs), where the edges represent components that can fail or structural relationships between the components. The graph contains one node, the source (meaning no arcs enter it), with no incoming edges and one node, the sink (also called destination or terminal nodes) with no outgoing edges. The arcs are assigned failure distributions. A system represented by a reliability graph fails when there is no path from the source to the sink. The edges can be assigned failure probabilities, failure rates or unavailability values or functions, the same as reliability block diagrams.

      A reliability graph is equivalent to a non-series-parallel reliability block diagram. In the reliability graph, the components are the arcs, while in the block diagram the components are the boxes. The non-series-parallel block diagram cannot be directly analyzed by (or even specified for) SHARPE, but the reliability graph can. The price for more generality is the increased complexity of solution.

      Example (Christophe)

    • Fault tree

      Fault trees represent all the sequences of individual component failures that cause the system to stop functioning, in a treelike structure. The starting point is the definition of a single, well-defined undesirable event, which is the root of the tree. In the study of reliability and availability, this undesirable event is system failure event. In assessing the safety of the system, the undesirable event is the potentially hazardous or unsafe condition.

      The fault tree is a pictorial representation of the combination of events that can cause the occurrence of an undesirable event. An event at level "i" is reduced to a combination of lower-level events by means of logic gates. The process of reduction stops when we reach basic events that we do not wish to reduce further. Distinct basic events can be component failures, human errors, external conditions, etc. We assume that basic events are mutually independent and that a failure probability, a failure rate or a failure distribution function or an unavailability is known for its occurrence. The occurrence of each event is denoted by a logic 1 at that node; otherwise the logic value of a node is 0.

      Each gate has inputs and outputs. The input to a gate is either a basic event or the output of another gate. The output of an "and" gate is a logic 1 if and only if all of its inputs are logic 1. The output of an "or" gate is a logic 1 if and only if one or more of its inputs are at logic 1. The output of a "k out of n" gate i s a logic 1 if k or more of the inputs are at logic 1. If two gates share an input, the fault tree is said to have repeated events.

      Three algorithms are implemented in SHARPE for fault tree analysis: series-parallel formula (used for fault trees without repeated events), VT algorithm (A multiple inversion (MVI) algorithm to obtain sum of disjoint products (SDP) from mincuts) and the factoring/conditioning algorithm. This last algorithm converts the fault tree with repeated events to a set of fault trees without repeated events by factoring on the repeated components then applying series-parallel formula to theses converted fault trees.

      Normally the factoring algorithm is very fast when the number of repeated events is less than 14. With the addition of BDD-based algorithm (Binary Decision Diagram), SHARPE can solve very large fault trees. The efficiency of BDD algorithm is an improvement over the original VT algorithm. Reliability/unreliability can be calculated from BDD, mincuts can be obtained during the analysis. The event's contribution to the system reliability (importance measures) can also be obtained.

      Example (Christophe)

  2. State-space Models

    The models we have looked at so far can be solved using fast algorithms assuming stochastic independence between system components. That is, for the availability models it was assumed that the failure or repair of a component was not affected by what was going on with any other component. If we want to model more complicated interactions between components, we must use other kinds of models like Markov chains. Many examples of dependencies among system components have been observed in practice and captured by Markov model.

    Modeling any system with a pure reliability/availability model can lead to incomplete, or, at least, less precise results. Gracefully degrading systems may be able to survive the failure of one or more of their active components and continue to provide service at a reduced level. One of the most commonly used technique for the modeling of gracefully degradable systems is the Markov reward model (MRM). But we may use also the following model types: Markov reward model or irreducible semi-Markov reward model and stochastic reward nets. SHARPE supports Stochastic Reward Nets (SRN) as a specification technique for largeness tolerance; SRN models are transformed into Markov reward models for analysis. The Stochastic Reward Net (and related formalism of generalized stochastic Petri net)is the only model type in SHARPE that requires a conversion to a different model (Markov reward model) to be solved. SHARPE supports a variety of solution methods for steady-state and transient analysis of these model types. SPNP also supports the specification and solution of stochastic reward nets.

    So far, we have described the possibility to model the systems by using non-state-space models (reliability block diagrams, fault trees) or Markov chains for availability analysis. The advantage of using non-state-space models (like block diagrams and fault trees) is that they are efficient to specify and solve. However, the solution of these models assumes the components are independent. For instance, in a block diagram, fault-tree or reliability graph, the components must be completely independent of one another in their failure and repair behavior. A failure in one component cannot affect the operation of another component, and components cannot share a repair facility. Markov models provide the ability to model systems that violate the assumptions made by the non-state-space models, but at the price of a state space explosion. A system having n components may require up to 2n states in a Markov chain representation.

    Example 1
    Example 2


  3. Hierarchical Models

    The state space explosion problem can be handled in two ways: It can be tolerated or it can be avoided: Large model tolerance must apply to specification, storage and solution of the model. If the storage and solution problems can be solved, the specification problem can be solved by using more concise (and smaller) model specifications that can be automatically transformed into Markov models. Large models can be avoided by using hierarchical model composition.

    The ability of SHARPE to combine results from different kinds of models makes it possible to use state-space methods for those parts of a system that require them, and use non-state-space methods for the more "well-behaved" parts of the system.

    Example (Christophe)