Pylone

Preventing cyber incidents in organizations: What can we learn from 100 years of safety research?

If a cyber incident is successful in an organization, an external attacker is often involved. However, many causes lie within the organization, e.g. insufficiently secure processes, systems or insecure behaviour. There are direct parallels here between cybersecurity and the area of safety. The difference, however, is that the field of safety and safety research is over a hundred years old and has more experience when it comes to investigating the causes of incidents. Cybersecurity often makes use of safety concepts: for example, US President Biden has established the Cyber Safety Review Board (CSRB) based on the model of the National Transportation Safety Board. Following the Log4J attacks, for example, the CSRB investigated the causes and summarized them in a report (2).

From Acts of God to Building Resilient Socio-Technical Systems

In the Middle Ages, it was assumed that accidents were “acts of god” (3). According to this logic, humans had little influence on accidents and their effects. At the beginning of the 20th century, accidents and their causes were investigated more systematically. Initially, the obvious causes were considered: sharp edges on machines or mistakes of individual employees. It was assumed that some employees were simply more prone to making mistakes due to their characteristics (“accident prone persons”) and ideally organizations did not want to have these “bad apples” (e.g., (4)). There was also the assumption that rules had to be defined and enforced in order to prevent the erroneous behavior of individual actors. However, these rules were often based on “work as imagined” rather than on actual working practices. We are still familiar with these ways of thinking in the field of cybersecurity today. When, for example, the Taurus espionage affair in Germany refers to the “individual application errors” of communication software (5), this implicitly means that there are no structural security flaws (e.g. difficult-to-use secure communication systems). In the field of safety, it was later assumed that accidents were due to a chain of unfortunate events and that there was a root cause (6). This assumption can also be found in the field of cybersecurity: the mitre attack framework, the cyberkill chain or the root cause analysis.

As the realization that unexpected incidents could not be completely prevented but that the consequences could be mitigated or prevented became more and more widespread, the view on the role of humans in incidents also changed. People began to accept that humans are not perfect and that human error is unavoidable. The example of the Swiss Federal Roads Office FEDRO shows the consequences in the area of bicycle traffic in cities: instead of error-promoting infrastructures, increasingly error-forgiving traffic infrastructures are being implemented today (7). These aim to prevent fatal consequences for cyclists who make mistakes, e.g. by separating lanes or providing cushioning. Error-forgiving infrastructures are also known in the field of cybersecurity. For example, endpoint security solutions can prevent the execution of malicious software by end users.  

In the area of safety, the focus also shifted more and more away from individual accident factors towards a systemic understanding of accidents. For example, in connection with the Three Mile Island nuclear power plant incident and other serious disasters, the man-made disaster theory was established (8). This theory assumes that accidents occur sooner or later in certain complex socio-technical systems. In addition to the obvious causes, there are also causes that are less obvious but latent and may precede the actual accident by a long time (e.g. lack of safety culture). Latent factors in the area of cybersecurity could be, for example, a lack of MFA, insufficient IT budgets and qualifications or low risk awareness among management.

However, not only major man-made disasters were examined in the area of safety, but also those organizations in which particularly few accidents occurred (9). An attempt was made to adapt characteristics of these “high-reliability organizations” to other organizations. Characteristics such as redundancy, risk awareness, training, effective communication and safety culture are also being discussed in cybersecurity today.

From Simple Metrics to Measuring Resilience

As early as the beginning of the 20th century, insurance data was used to investigate whether major accidents could be predicted on the basis of minor accidents. The assumption was that, for example, one major accident could be expected for every 300 minor accidents. Based on this assumption, attempts were made to prevent minor accidents and use them as metrics for safety and to predict major accidents (8). Similarly, small incidents (e.g. prevented malware execution) are now used as a metric in the area of security in order to assess the security situation. In the area of safety, however, there are no clear findings that small incidents really predict larger ones and that both types of incidents have comparable causes.

However, using incidents as metrics for safety or security leads to another problem. What if no incidents happen? Is the organization then automatically particularly safe or secure? In order to take this into account, alternative metrics such as safety margins, training, communication processes or “positive” metrics have been used. The latter aim to measure what is going well, e.g. particularly safe employee behavior.

From Isolated to Industry-wide Learning

Another problem that the safety and security sector share is that learning was initially very much limited to individual organizations. However, these have limited opportunities to learn from incidents, not least because major incidents per organization can be very rare. For this reason, systems have been created in aviation, for example, in which incidents can be shared confidentially, anonymously and without being penalized within the entire industry (12). Such systems are also conceivable in the area of cybersecurity. 

From Mono- to Interdisciplinary Work

Cybersecurity is increasingly recognizing that many disciplines need to work together in order to positively influence security in an organization. In safety, too, work was initially monodisciplinary, whereas today various disciplines (e.g. engineers, psychologists, sociologists) work together to solve complex issues (14).

Summary: What Cybersecurity we learn from Safety?

  1. Cyberattacks are unavoidable but incidents can be prevented when understood J
  2. Thinking in sociotechnical systems (e.g. security margins, control loops, security culture) and not only in simple, obvious explanations for incidents («ransomware», «bad click»)
  3. Designing technology and processes to support humans instead of focusing on human weakness (e.g., helping humans to report unknown phishing mails)
  4. Reflecting on security KPIs (often easy to obtain but bad proxies for «security»)
  5. Creating zones of trust and possibilities for industry-wide learning from occurrences (not only severe security incidents)
  6. Understanding security as an interdisciplinary team effort of technical and non-technical disciplines (e.g., engineers, social scientists, psychologists)

Related Article:

  1. Ebert, N., Schaltegger, T., Ambuehl, B., Schöni, L., Zimmermann, V., & Knieps, M. (2023). Learning from safety science: A way forward for studying cybersecurity incidents in organizations. Computers & Security, 103435. https://www.sciencedirect.com/science/article/pii/S0167404823003450

Literature

  1. https://www.cisa.gov/sites/default/files/publications/CSRB-Report-on-Log4-July-11-2022_508.pdf
  2. Loimer, H., & Guarnieri, M. (1996). Accidents and acts of God: a history of the terms. 
     American journal of public health86(1), 101-107.
  3. Marbe, Karl. 1926. Praktische Psychologie der Unfälle und Betriebsschäden. München: R. Oldenbourg.
  4. https://www.heise.de/news/Offizier-in-Singapur-hatte-bei-Taurus-Leak-ungesicherte-Verbindung-genutzt-9646217.html
  5. Heinrich (1950) Industrial Accident Prevention: A Scientific Approach
  6. https://www.astra.admin.ch/dam/astra/de/dokumente/langsamverkehr/handbuch-veloverkehr-kreuzungen.pdf.download.pdf
  7. Dekker, S. (2019). Foundations of Safety Science: A Century of Understanding Accidents and Disasters (1st ed.). Routledge., pp. 219
  8. Dekker, S. (2019). Foundations of Safety Science: A Century of Understanding Accidents and Disasters (1st ed.). Routledge., pp. 289
  9. https://www.hfacs.com/hfacs-framework.html
  10. https://www.nzz.ch/technologie/kriminelle-hacker-greifen-die-nzz-an-und-erpressen-sie-cyberangriff-ransomware-ld.1778725
  11. https://asrs.arc.nasa.gov/
  12. Dekker, S. (2019). Foundations of Safety Science: A Century of Understanding Accidents and Disasters (1st ed.). Routledge., pp. 1