|
If you have Adobe Reader you can also download a .pdf version. 
Applying the latest standard for Functional Safety - IEC 61511
A G Foord, W G Gulland, C R Howard, T Kellacher, W H Smith (4-sight Consulting)
Synopsis
This paper focuses on a technique for risk assessment, Layer of Protection Analysis (LOPA), that is
relatively new to Europe and compares it with two established techniques: Quantitative Risk Assessment
(QRA) and Risk Graphs. It describes our experience in applying the latest standard for functional safety “BS
IEC 61511:2003 Functional Safety - Safety Instrumented Systems (SIS) for the Process Industry Sector”1.
The main lessons learned are illustrated by real examples, changed to preserve confidentiality but still illustrating relevant issues.
Acknowledgements
Section 3.5 was first published by Springer2 and we are grateful to them and the Safety Critical Systems
Club for permission to include it. The other definitions in Section 3 and Section 9 are quoted from IEC 615111.
Introduction
“Functional” and “safety” are words that have been used for centuries and, although using “functional safety” to describe
the action of a protection system is a relatively recent innovation, the meaning is clear enough. Other terms used are
less obvious and are defined in the standard and repeated below for those not familiar with them. This paper discusses
the application of three popular methods of determining Safety Integrity Level (SIL) requirements – Quantitative Risk
Assessment (QRA), risk graph methods and Layer Of Protection Analysis (LOPA) – to process industry installations. It
identifies some of the advantages and limitations of each method and suggests criteria for identifying which of these methods is appropriate in specific situations.
Definitions
IEC 61511 covers the whole lifecycle as shown in Figure 1, but this paper is concerned only with phases 1 through 3,
leading to the “Safety Requirements Specification for the Safety Instrumented System”.
Figure 1 - Lifecycle from IEC 61511
Layers of Protection
The introduction of the layers of protection concept shown in Figure 2 originates from the American approach to Safety
Instrumented Systems (SIS) in ANSI ISA-SP 84.01-19963. This American standard has been the major influence in the differences between IEC 615084 and IEC 61511 and the importance of independence between layers and the
implications of common cause issues between layers is emphasised. The allocation of safety functions to specific
layers or systems (for example a hazard may be protected by a combination of relief valves, physical barriers and bunds
and a SIS); and the contribution required of each element to the overall risk reduction should be specified as part of the
transfer of information from the risk analysis to those responsible for the design and engineering.
Figure 2 - Typical risk reduction methods found in process plants from IEC 61511-1 Figure 9
BPCS
The Basic Process Control System (BPCS) is a key layer of protection “which responds to input signals from the
process, its associated equipment, other programmable systems and/or operator and generates output signals causing
the process and its associated equipment to operate in the desired manner but which does not perform any safety
instrumented functions with a claimed SIL ≥ 1”. Note that in IEC 61508 the BPCS is part of the definition of Equipment Under Control (EUC).
SIF
A Safety Instrumented Function (SIF) is a safety function with a specified SIL which is necessary to achieve functional
safety. IEC 61511 also includes a note to explain that this normally refers to protection systems and that if it is applied
to control systems then “further detailed analysis may be required to demonstrate that the system is capable of achieving the safety requirements”.
SIS
A Safety Instrumented System (SIS) is used to implement one or more SIFs. A SIS is composed of any combination of
sensor(s), logic solver(s), and final element(s).
SIL
The two standards (IEC 61508 and IEC 61511) define Safety Integrity as “probability” of success and then define the
Safety Integrity Level (SIL) as four discrete levels (1 to 4) such that “level 4 has the highest safety integrity”. Although the
standards concentrate on “Safety” and “SIL”, the principles that they address can also be applied to protection against
environmental and financial risks; “EIL” and “FIL” can be applied analogously with “SIL”, and Integrity Level (IL) used as a
term applying to all three protection functions.
The definition of SIL is clear for those SIFs that are only called upon at a low frequency / have a low demand rate
Elsewhere the same two standards recognise that safety functions can be required to operate in quite different ways (for
example, continuously) and SIL is defined as a failure rate (in units of failures/hour). These two different uses of the
same SIL terminology have caused considerable confusion. This section of this paper was first presented at the Safety-Critical Systems Symposium in February 2004 (SSS04)2 and attempts to clarify the definition of SIL. Consider a car;
examples of low demand functions are:
- Anti-lock braking (ABS). (It depends on the driver, of course!).
- Secondary restraint system (air bags).
On the other hand there are functions that are in frequent or continuous use; examples of such functions are:
The fundamental question is how frequently will failures of either type of function lead to accidents. The answer is
different for the 2 types:
For functions with a low demand rate, the accident rate is a combination of 2 parameters – i) the frequency
of demands, and ii) the Probability the function Fails on Demand (PFD). In this case, therefore, the
appropriate measure of performance of the function is PFD, or its reciprocal, Risk Reduction Factor (RRF).
For functions that have a high demand rate or operate continuously, the accident rate is the failure rate, λ,
which is the appropriate measure of performance. An alternative measure is Mean Time To Failure (MTTF)
of the function. Provided failures are exponentially distributed, MTTF is the reciprocal of λ.
These performance measures are, of course, related. At its simplest, provided the function can be proof-tested at a
frequency that is greater than the demand rate, the relationship can be expressed as:
- PFD = λT/2 or = T/(2 x MTTF), or
- RRF = 2/(λT) or = (2 x MTTF)/T
where T is the proof-test interval. (Note that to significantly reduce the accident rate below the failure rate of the function,
the test frequency, 1/T, should be at least 2 and preferably ≥ 5 times the demand frequency.) They are, however,
different quantities. PFD is a probability – dimensionless; λ is a rate – dimension t-1. The standards, however, use the
same term – SIL – for both these measures, with the following definitions:
In low demand mode, SIL is a proxy for PFD; in high demand / continuous mode, SIL is a proxy for failure rate. (The
boundary between low demand mode and high demand /continuous mode is in essence set in the standards at one
demand per year. This is consistent with proof-test intervals of 3 to 6 months, which in many cases will be the shortest feasible interval.)
Now consider a function which protects against 2 different hazards, one of which occurs at a rate of 1 every 2 weeks, or
25 times per year, i.e. a high demand rate, and the other at a rate of 1 in 10 years, i.e. a low demand rate. If the MTTF
of the function is 50 years, it would qualify as achieving SIL1 for the high demand rate hazard. The high demands
effectively proof-test the function against the low demand rate hazard. All else being equal, the effective achieved SIL for the second hazard is given by:
- PFD = 0.04/(2 x 50) = 4 x 10-4 ≡ SIL3
So what is the SIL achieved by the function? Clearly it is not unique, but depends on the hazard and in particular
whether the demand rate for the hazard implies low or high demand mode.
In the first case, the achievable SIL is intrinsic to the equipment; in the second case, although the intrinsic quality of the
equipment is important, the achievable SIL is also affected by the testing regime. This is important in the process
industry sector, where achievable SILs are liable to be dominated by the reliability of field equipment – process
measurement instruments and, particularly, final elements such as shutdown valves – which need to be regularly tested to achieve required SILs.
The difference between these two definitions of SIL often leads to misunderstandings.
Concepts of Residual Risk, Risk Reduction and Required SIL
Both IEC 61508 & 61511 imply that the only action of a SIS is to reduce the frequency or likelihood of a hazard. Thus
the model of risk (reproduced in Figure 3) is one-dimensional. All the methods of determining SIL are based on similar principles:
Step 1 Identify the “process risk” from the process and the BPCS.
Step 2 Identify the “tolerable risk” for the particular process.
Step 3 If the process risk exceeds the tolerable risk, then calculate the necessary risk reduction and whether
the protection layers will operate in continuous or demand mode
Step 4 Identify the risk reduction factors achieved by other protection layers
Step 5 Calculate the remaining risk reduction factor (RRF) or the failure rate that should be achieved by the
SIS and thus from Table 1 or 2 the required SIL
Figure 3 - Risk Reduction Model from IEC 61511
|
SIL
|
Range of Average PFD
|
Range of RRF*
|
|
4
|
10-5 ≤ PFD < 10-4
|
100,000 ≥ RRF > 10,000
|
|
3
|
10-4 ≤ PFD < 10-3
|
10,000 ≥ RRF > 1,000
|
|
2
|
10-3 ≤ PFD < 10-2
|
1,000 ≥ RRF > 100
|
|
1
|
10-2 ≤ PFD < 10-1
|
100 ≥ RRF > 10
|
|
*This column is not part of the standards, but RRF is often a more tractable parameter than PFD.
Table 1 - Definitions of SILs for Demand Mode of Operation from IEC 61511-1 (Table 3)
|
SIL
|
Range of λ (failures per hour)
|
~ Range of MTTF (years)*
|
|
4
|
10-9 ≤ λ < 10-8
|
100,000 ≥ MTTF > 10,000
|
|
3
|
10-8 ≤ λ < 10-7
|
10,000 ≥ MTTF > 1,000
|
|
2
|
10-7 ≤ λ < 10-6
|
1,000 ≥ MTTF > 100
|
|
1
|
10-6 ≤ λ < 10-5
|
100 ≥ MTTF > 10
|
|
*This column is not part of the standards, but the authors have found these approximate MTTF values to be useful in the process industry
sector, where time tends to be measured in years rather than hours.
Table 2 - Definitions of SILs for Continuous Mode of Operation from IEC 61511-1 (Table 4)
The residual risk is the process risk reduced by all the risk reduction factors and will normally be less than the tolerable
risk. Identifying the tolerable risk is a major issue that is discussed in Reducing Risks, Protecting People (R2P2)5 and
is beyond the scope of this paper. Identifying the frequencies of all initiating causes (or the demand rates used in Steps
1 & 3 above) is also difficult unless excellent records of all incidents are available.
Some Methods of Determining SIL Requirements
- IEC 61508 offers 3 methods of determining SIL requirements:
- Quantitative method.
- Risk graph, described in the standard as a qualitative method.
- Hazardous event severity matrix, also described as a qualitative method.
- IEC 61511 offers:
- Semi-quantitative method (incorporating the use of fault and event trees).
- Safety layer matrix method, described as a semi-qualitative method.
- Calibrated risk graph, described in the standard as a semi-qualitative method, but by some practitioners as a
semi-quantitative method.
- Risk graph, described as a qualitative method.
- Layer of protection analysis (LOPA). (Although the standard does not assign this method a position on the
qualitative / quantitative scale, it is weighted toward the quantitative end.)
These are developments and extensions of the methods originally outlined in IEC 61508-5. They have all been used by
various organisations in the determination of SILs, but with varying degrees of success and acceptability; and do not
provide an exhaustive list of all the possible methods of risk assessment. All of these methods require some degree of
tailoring to meet the requirements of an individual company, together with training of the personnel who will apply them,
before they can be used successfully. QRA, risk graphs and LOPA are established methods for determining SIL
requirements, particularly in the process industry sector, but LOPA is less well known in the UK and is the focus of this paper.
Typical Results
|
SIL
|
Number of Functions
|
% of Total
|
|
4
|
0
|
0%
|
|
3
|
0
|
0%
|
|
2
|
1
|
0.3%
|
|
1
|
18
|
6.0%
|
|
None
|
281
|
93.7%
|
|
Total
|
300
|
100%
|
|
Table 3 - Typical Results of SIL Assessment
As one would expect, there is wide variation from installation to installation in the numbers of functions that are
assessed as requiring SIL ratings, but the numbers in Table 3 were assessed for a reasonably typical offshore gas
platform. Typically in the process sector there might be a single SIL3 requirement in an application of this size, while
identification of SIL4 requirements is very rare. If a SIL3 or SIL4 requirement is identified it is reasonable to investigate
the use made of the basic process design and other protection layers in risk reduction and whether undue reliance is
being placed on the SIS; and indicates a serious need for redesign.
After-the-Event Protection
Some functions on process plants are invoked “after-the-event”, i.e. after a loss of containment, after a fire has started or
an explosion has occurred. Fire and gas detection and emergency shutdown are the principal examples of such
functions. Assessment of the required SILs of such functions presents specific problems:
- Because they operate after the event, there may already have been consequences that they can do nothing to
prevent or mitigate. The initial consequences must be separated from the later consequences
- The event may develop and escalate to a number of different eventual outcomes with a range of consequence
severity, depending on a number of intermediate events. Analysis of the likelihood of each outcome is a specialist task, often based on event trees (Figure 4).
Figure 4 - Event Tree for After the Event Protection
QRA
Quantitative Risk Assessment is usually done with Fault Trees and Event Trees or Reliability Block Diagrams (RBDs).
Some people refer to a combination of Fault and Event Tree as a Cause-Consequence Diagram. Figure 4 shows an
example of an Event Tree and Figures 5 and 6 show a Fault Tree and a RBD.
Figure 5 - Fault Tree for Overpressure at Compressor Outlet
Figure 6 - Reliability Bock Diagram of Compressor Outlet Pressure
Normally the “Top Event” will be a particular hazard and provided that:
- appropriate failure models are chosen for each basic event or block;
- accurate data is available for the particular environment for each of the failure modes, repairs and tests; and
- all the relationships are correctly modelled; then
the frequency at which the hazard occurs, and hence the risk can be calculated (see textbooks, for example6). The
successful outcome of a QRA is highly dependent on the assumptions that are made, the detail of the model developed
to represent the hazardous event and the data that is used. However well a QRA has been done it does not provide an
absolute indication of the residual risk. A sensitivity analysis of the data and assumptions is a fundamental element of any QRA.
Risk Graph Methods
Figure 7 shows a typical risk graph. The risk graph method is described in both IEC 61508 & 61511 and is an excellent
means of quickly assessing and screening a large number of safety functions so as to allow effort to be focused on the
small percentage of critical functions. The advantages and disadvantages and range of applicability of risk graphs are
the main topic of a previous paper by W G Gulland at SSS042. The results of that paper are given in the conclusions
below. In use the risk graph needs calibration to align with a company’s corporate risk criteria.
Figure 7 - Typical Risk Graph
A serious limitation of the risk graph method is that it does not lend itself at all well to assessing “after the event”
outcomes:
Demand rates would be expected to be very low, e.g. 1 in 1,000 to 10,000 years. This is off the scale of
most of the risk graphs used.
The range of outcomes from function to function may be very large, from a single injured person to major loss
of life. The outcomes are also potentially random depending on a wide range of circumstances. Where
large-scale consequences are possible, use of such a coarse tool such as the risk graph method can hardly
be considered “suitable” and “sufficient”.
The QRA and the LOPA methods do not have these limitations, particularly if the LOPA method is applied quantitatively
and, as such, are more suited to analysing “after the event” outcomes.
Layer of Protection Analysis (LOPA)
The LOPA method was developed by the American Institute of Chemical Engineers as a method of assessing the SIL
requirements of SIFs (see textbooks, for example7).
The method starts with a list of all the process hazards on an installation as identified by Hazard And Operability Studies
(HAZOPs) or other hazard identification techniques. The hazards are analysed in terms of:
- Consequence description (“Impact Event Description”)
- Estimate of consequence severity (“Severity Level”)
- Description of all causes which could lead to the Impact Event (“Initiating Causes”)
- Estimate of frequency of all Initiating Causes (“Initiation Likelihood”)
The Severity Level may be expressed in semi-quantitative terms, linked to target Mitigated Event Likelihoods
expressed as target frequency ranges (analogous to tolerable risk levels), as shown in Table 4; or it may be expressed
as a specific quantitative estimate of harm, which can be referenced to F-N curves.
|
Severity Level
|
Consequence
|
Target Mitigated Event Likelihood
|
|
Minor
|
Serious injury at worst
|
No specific requirement
|
|
Serious
|
Serious permanent injury or up to 3 fatalities
|
< 3E-6 per year, or 1 in > 330,000 years
|
|
Extensive
|
4 or 5 fatalities
|
< 2E-6 per year, or 1 in > 500,000 years
|
|
Catastrophic
|
> 5 fatalities
|
Use F-N curve
|
|
Table 4 - Example Definitions of Severity Levels and Mitigated Event Target Frequencies
Similarly, the Initiation Likelihood may be expressed semi-quantitatively, as shown in Table 5; or it may be expressed as
a specific quantitative estimate.
|
Initiation Likelihood
|
Frequency Range
|
|
Low
|
< 1 in 10,000 years
|
|
Medium
|
1 in > 100 to 10,000 years
|
|
High
|
1 in ≤ 100 years
|
|
Table 5 - Example Definitions of Initiation Likelihood
The strength of the method is that it recognises that in the process industries there are usually several layers of
protection against an Initiating Cause leading to an Impact Event. Specifically, it identifies:
General Process Design. There may, for example, be aspects of the design that reduce the probability of
loss of containment, or of ignition if containment is lost, so reducing the probability of a fire or explosion event.
Basic Process Control System (BPCS). Failure of a process control loop is likely to be one of the main
Initiating Causes. However, there may be another independent control loop that could prevent the Impact
Event, and so reduce the frequency of that event.
Alarms. Provided there is an alarm that is independent of the BPCS, sufficient time for an operator to
respond, and an effective action to take (a “handle” to “pull”), credit can be taken for alarms to reduce the
probability of the Impact Event up to a RRF of 10.
Additional Mitigation, Restricted Access. Even if the Impact Event occurs, there may be limits on the
occupation of the hazardous area (equivalent to the F parameter in the risk graph method), or effective
means of escape from the hazardous area (equivalent to the P parameter in the risk graph method), which reduce the Severity Level of the event.
Independent Protection Layers (IPLs). A number of criteria must be satisfied by an IPL to be assured that it
is genuinely independent of other protective layers and achieves RRF ≥ 10. Relief valves and bursting disks usually qualify for RRF ≥ 100.
Based on the Initiating Likelihood (frequency) and the PFDs of all the protection layers listed above, an Intermediate
Event Likelihood (frequency) for the Impact Event and the Initiating Event can be calculated. The process must be
completed for all Initiating Events, to determine a total Intermediate Event Likelihood for all Initiating Events. This can
then be compared with the target Mitigated Event Likelihood (frequency). So far no credit has been taken for any SIF. The ratio:
(Intermediate Event Likelihood) / (Mitigated Event Likelihood)
gives the required RRF (or 1/PFD) of the SIF, and can be converted to a required SIL using Table 1. Alternatively the
inverse ratio
(Mitigated Event Likelihood) / (Intermediate Event Likelihood)
gives the required PFD of the SIF that can be converted to a required SIL using Table 1.
Examples of LOPA
Compressor (as shown in Figure 8)
Figure 8 - Example of overpressure protection for a compressor driven by a gas turbine
Pipeline
The Pipeline studied contained a liquid that would evaporate if released and had Passive Fire Protection (PFP). Two
of the impact events considered were Jet Fires and a Boiling Liquid Expanding Vapour Explosion (BLEVE). Some of
the results of the LOPA are shown in Table 7.
Discussion of all three methods
QRA
Page 31 of R2P25 states that “The use of numerical estimates of risk by themselves can, for several reasons ..., be
misleading and lead to decisions which do not meet adequate levels of safety. In general, qualitative learning and
numerical estimates from QRA should be combined with other information from engineering and operational analyses in making an overall decision.”
Fault Trees, Event Trees and RBDs are very valuable in showing relationships between different parts of the process,
the BPCS and the protection systems. However, there are difficulties in obtaining good data for all the relevant failure
modes as many business sector reliability databases have not been maintained. Therefore numerical estimates of risk
will take the form of a range and judgement will be required to assess a realistic figure.
The problems with the data also apply if LOPA is used for quantitative assessments.
Risk Graphs
The implications of the issues highlighted by W G Gulland at SSS042 are:
Risk graphs are very useful but imprecise tools for assessing SIL requirements. (It is inevitable that a
method with 5 parameters – C, F, P, W and SIL – each with a range of an order of magnitude, will produce a
result with a range of 5 orders of magnitude.)
They must be calibrated on a conservative basis to avoid the danger that they under-estimate the
unprotected risk and the amount of risk reduction / protection required.
Their use is most appropriate when a number of functions protect against different hazards, which are
themselves only a small proportion of the overall total hazards, so that it is very likely that under-estimates
and over-estimates of residual risk will average out when they are aggregated. Only in these circumstances
can the method be realistically described as providing a “suitable” and “sufficient”, and therefore legal, risk assessment.
Higher SIL requirements (SIL2+) incur significant capital costs (for redundancy and rigorous engineering
requirements) and operating costs (for applying rigorous maintenance procedures to more equipment, and
for proof-testing more equipment at higher frequencies, and to rigorously gather and analyse performance
data). They should therefore be re-assessed using a more refined method.
LOPA
The LOPA method has the following advantages:
It can be used semi-quantitatively or quantitatively.
Used semi-quantitatively it has many of the same advantages as risk graph methods.
Used quantitatively the logic of the analysis can still be developed as a team exercise, with the detail
developed “off-line” by specialists.
It explicitly accounts for risk mitigating factors, such as alarms and relief valves, which have to be
incorporated as adjustments into risk graph methods (e.g. by reducing the W value to take credit for alarms,
by reducing the SIL to take credit for relief valves).
A semi-quantitative analysis of a high SIL function can be promoted to a quantitative analysis without
changing the format.
It can assist in all the team members obtaining and sharing a full appreciation of the issues and uncertainties
associated with the hazardous event(s).
Conclusions
To summarise, the relative advantages and disadvantages of these methods are shown in Table 8, and as can be seen
from Table 8 there is no ideal candidate to cover all requirements - an assessment has to be made as to the most
appropriate method for a specific requirement. Should the total number of functions requiring assessment be small (<
10) and acceptable reliability data available then our experience would be to apply LOPA in a semi-quantitative manner.
However on new installations the number of functions identified in the HAZOP as requiring a SIF can be very large
requiring the involvement of critical people in a team activity over a considerable period of time. Sufficient time for this
is a rare commodity these days and, in such a situation, we would recommend the use of risk graphs initially for all
required functions (approx. 25 functions assessed per day on average) and then repeat the assessment using LOPA for
those functions assessed as ≥ IL2 (approx. 5 functions assessed per day on average).
Whatever process of analysis is applied they all require a corporate risk policy defining what risk level is deemed
acceptable from both individual and societal perspectives – a politically sensitive decision has to be agreed within any
business organisation, with an acute awareness of the perception of risk held by the general public.
Whilst the standards IEC 61508/61511 only relate to Safety of people there is little doubt that the Environmental
agencies will require businesses focus to improve the environment whilst stake-holders will require similar attention to commercial performance.
References
2003, BS IEC 61511 Functional safety - Safety instrumented systems for the process industry sector.
Gulland, W. G., 2004, Methods of Determining Safety Integrity Level (SIL) Requirements - Pros and Cons,
Proceedings of the Safety-Critical Systems Symposium - February 2004.
1996, Application of Safety Instrumented Systems for the Process Industries, Instrument Society of America
Standards and Practices, ANSI / ISA-SP 84.01-1996.
1998 - 2000, BS IEC 61508, Functional safety of electrical / electronic / programmable electronic safety
-related systems
2001, Reducing risks, protecting people – HSE’s decision making process, (R2P2), HSE Books, ISBN 0
-7176-2151-0
Smith, D. J., 2001, Reliability, Maintainability and Risk, 6th Edition, ISBN 0-7506-5168-7
2001, Layer of Protection Analysis – Simplified Process Risk Assessment, American Institute of Chemical
Engineers, ISBN 9-780816-908110
|