Model Types Used to Study Availability
We will briefly describe the model types used to compute the
reliability and the availability of a system.
- Combinatorial Model Types
- State-space Models
- Hierarchical Models
- Combinatorial Model Types
The three model types that are commonly used for reliability and availability
are reliability block diagrams, reliability graphs and fault trees. These model
types are similar in that they capture conditions that make a system fail in
terms of the structural relationships between the system components. A
comparison of the power of these model types has been made in this
paper.
- Reliability block diagrams
A series-parallel reliability block diagram represents the logical structure of
a system with regard to how the reliability of its components affects the
system reliability. In a block diagram model, components are combined into
blocks in series, in parallel or in k-out-of-n configurations. All of these
constructs can be used together in a single block diagram. Components of the
same type that appear more than once in the system structure are assumed to be
copies with independent, identical distribution functions. Each component may
have a failure probability, a failure rate, a failure distribution function or
unavailability attached to it.
The assumption of independence and series-parallel structure allows very fast
computation of reliability and availability measures. However, many system
models in practice do not follow series-parallel structure. The
SHARPE
software package allows easy specification and solution of such models
Example (Christophe)
- Reliability graphs
The reliability graph model consists of a set of nodes and edges (and directed
arcs), where the edges represent components that can fail or structural
relationships between the components. The graph contains one node, the source
(meaning no arcs enter it), with no incoming edges and one node, the sink (also
called destination or terminal nodes) with no outgoing edges. The arcs are
assigned failure distributions. A system represented by a reliability graph
fails when there is no path from the source to the sink. The edges can be
assigned failure probabilities, failure rates or unavailability values or
functions, the same as reliability block diagrams.
A reliability graph is equivalent to a non-series-parallel reliability block
diagram. In the reliability graph, the components are the arcs, while in the
block diagram the components are the boxes. The non-series-parallel block
diagram cannot be directly analyzed by (or even specified for)
SHARPE,
but the reliability graph can. The price for more generality is the increased
complexity of solution.
Example (Christophe)
- Fault tree
Fault trees represent all the sequences of individual component failures that
cause the system to stop functioning, in a treelike structure. The starting
point is the definition of a single, well-defined undesirable event, which is
the root of the tree. In the study of reliability and availability, this
undesirable event is system failure event. In assessing the safety of the
system, the undesirable event is the potentially hazardous or unsafe condition.
The fault tree is a pictorial representation of the combination of events that
can cause the occurrence of an undesirable event. An event at level "i" is
reduced to a combination of lower-level events by means of logic gates. The
process of reduction stops when we reach basic events that we do not wish to
reduce further. Distinct basic events can be component failures, human errors,
external conditions, etc. We assume that basic events are mutually independent
and that a failure probability, a failure rate or a failure distribution
function or an unavailability is known for its occurrence. The occurrence of
each event is denoted by a logic 1 at that node;
otherwise the logic value of a node is 0.
Each gate has inputs and outputs. The input to a gate is either a basic event
or the output of another gate. The output of an "and" gate is a logic 1 if and
only if all of its inputs are logic 1. The output of an "or" gate is a logic 1
if and only if one or more of its inputs are at logic 1. The output of a "k
out of n" gate i s a logic 1 if k or more of the inputs are at logic 1. If two
gates share an input, the fault tree is said to have repeated events.
Three algorithms are implemented in
SHARPE for fault tree analysis:
series-parallel formula (used for fault trees without repeated events),
VT algorithm
(A multiple inversion (MVI) algorithm to obtain sum of disjoint
products (SDP) from mincuts) and the factoring/conditioning algorithm. This
last algorithm converts the fault tree with repeated events to a set of fault
trees without repeated events by factoring on the repeated components then
applying series-parallel formula to theses converted fault trees.
Normally the factoring algorithm is very fast when the number of repeated
events is less than 14. With the addition of
BDD-based algorithm
(Binary Decision Diagram),
SHARPE
can solve very large fault trees.
The efficiency of BDD algorithm is an improvement over the original
VT algorithm. Reliability/unreliability can be calculated from BDD,
mincuts can be obtained during the analysis. The event's contribution
to the system reliability (importance measures) can also be obtained.
Example (Christophe)
- State-space Models
The models we have looked at so far can be solved using fast algorithms
assuming stochastic independence between system components. That is, for
the availability models it was assumed that the failure or repair of a
component was not affected by what was going on with any other component.
If we want to model more complicated interactions between components,
we must use other kinds of models like Markov chains. Many examples
of dependencies
among system components have been observed in practice
and captured by Markov model.
Modeling any system with a pure reliability/availability model can lead
to incomplete, or, at least, less precise results. Gracefully degrading
systems may be able to survive the failure of one or more of their active
components and continue to provide service at a reduced level. One of
the most commonly used technique for the modeling of gracefully
degradable systems is the Markov reward model (MRM). But we may use also
the following model types: Markov reward model or irreducible
semi-Markov reward model and stochastic reward nets. SHARPE supports Stochastic
Reward Nets (SRN) as a specification technique for largeness tolerance;
SRN models are transformed into Markov reward models for analysis.
The Stochastic Reward Net (and related formalism of generalized stochastic
Petri net)is the only model type in SHARPE that requires a conversion to a
different model (Markov reward model) to be solved. SHARPE supports a variety
of solution methods for steady-state and transient analysis of these
model types.
SPNP
also supports the specification and solution of stochastic reward nets.
So far, we have described the possibility to model the systems by using
non-state-space models (reliability block diagrams, fault trees) or
Markov chains for availability analysis. The advantage of using
non-state-space models (like block diagrams and fault trees) is that they
are efficient to specify and solve. However, the solution of these models
assumes the components are independent. For instance, in a block diagram,
fault-tree or reliability graph, the components must be completely
independent of one another in their failure and repair behavior. A failure
in one component cannot affect the operation of another component, and
components cannot share a repair facility. Markov models provide the ability
to model systems that violate the assumptions made by the
non-state-space models, but at the price of a state space explosion.
A system having n components may require up to 2n states in a Markov
chain representation.
Example 1
Example 2
- Hierarchical Models
The state space explosion problem can be handled in two ways:
It can be tolerated or it can be avoided: Large model
tolerance must apply to specification, storage and solution
of the model. If the storage and solution problems can
be solved, the specification problem can be solved by using
more concise (and smaller) model specifications that can be
automatically transformed into Markov models. Large models
can be avoided by using hierarchical model composition.
The ability of SHARPE to combine results from different
kinds of models makes it possible to use state-space
methods for those parts of a system that require them,
and use non-state-space methods for the more
"well-behaved" parts of the system.
Example (Christophe)