Incident Psychology — Trivia & Interesting Facts¶
Surprising, historical, and little-known facts about the psychology of incidents and human factors in operations.
The "Swiss cheese model" of accidents was developed by James Reason in 1990¶
James Reason, a psychology professor at the University of Manchester, proposed that accidents occur when holes in multiple defensive layers align — like slices of Swiss cheese lining up. Each layer (training, procedures, automation, monitoring) has weaknesses, and an incident happens only when the holes in all layers align simultaneously. This model fundamentally changed how aviation, medicine, and later tech think about system failures.
Hindsight bias makes every incident look preventable after the fact¶
Hindsight bias, documented extensively by psychologist Baruch Fischhoff in 1975, means that once you know what happened, it seems obvious and preventable. In incident reviews, this manifests as "how could they not see that?" — but the engineer making the decision didn't have the information you have now. Sidney Dekker's work on "just culture" argues that fighting hindsight bias is the single most important aspect of fair incident analysis.
Tunnel vision during incidents is a documented neurological phenomenon¶
Under stress, the brain's prefrontal cortex (responsible for creative problem-solving) becomes less active, while the amygdala (fight-or-flight) becomes more active. This produces literal cognitive tunneling: engineers fixate on a single hypothesis and ignore contradicting evidence. Research by Gary Klein on naturalistic decision-making shows that this effect is strongest in the first 5-10 minutes of a high-stress incident — exactly when clear thinking matters most.
Sleep deprivation at 24 hours impairs cognition as much as a 0.10% BAC¶
Research by Dawson and Reid (1997) found that being awake for 24 hours produces cognitive impairment equivalent to a blood alcohol concentration of 0.10% — above the legal driving limit in every US state. This is directly relevant to on-call rotations: an engineer paged at 3 AM after a full day of work is literally impaired. This research drove the adoption of follow-the-sun on-call models and maximum on-call shift lengths.
The "normalization of deviance" explains why known risks are ignored until disaster¶
Sociologist Diane Vaughan coined this term analyzing the 1986 Challenger shuttle disaster. She found that NASA engineers had documented O-ring erosion on previous flights but gradually accepted it as normal. In tech, normalization of deviance manifests as "oh, that alert fires all the time, just ignore it" — until the one time the alert is real. This concept is directly cited in Google's SRE book as a key risk in operations.
Human error is a symptom, not a cause — and this distinction matters enormously¶
Sidney Dekker's "Field Guide to Understanding Human Error" (2006) argues that "human error" is never a root cause but always a symptom of deeper systemic problems: poor tooling, unclear procedures, time pressure, or inadequate training. Labeling something as "human error" and stopping the investigation is the single most common failure mode in incident postmortems. The question should always be: "why did this action make sense to the person at the time?"
Alert fatigue kills — literally, in medicine, and metaphorically in ops¶
A 2014 study found that ICU nurses receive an average of 187 alarms per patient per day, and 72-99% are false alarms. The result: clinicians learn to ignore alarms, a phenomenon called "cry wolf" or alert fatigue. In ops, the parallel is exact. PagerDuty's data shows that teams with more than 40 alerts per week per person have 2-3x longer incident response times because engineers stop trusting the alerts.
The "just world" fallacy makes people blame victims of system failures¶
The just world fallacy — the cognitive bias that bad outcomes must be deserved — drives blame-oriented incident culture. When an engineer's change causes an outage, observers unconsciously assume the engineer must have been careless or incompetent. Research by Melvin Lerner (1980) shows this bias is deeply ingrained and requires active, deliberate effort to counteract. Blameless postmortem culture is essentially an organizational intervention against this cognitive bias.
Stress inoculation training (SIT) works for on-call engineers, not just soldiers¶
Stress inoculation training, developed by psychologist Donald Meichenbaum in 1985, involves controlled exposure to stressors before facing them for real. Game Days and chaos engineering exercises serve exactly this function for on-call engineers. Research shows that people who have practiced responding to simulated incidents experience less anxiety, make better decisions, and recover faster during real incidents.
The peak-end rule means people remember incidents by their worst moment and their ending¶
Daniel Kahneman's peak-end rule (1993) states that people judge experiences primarily by their most intense moment and how they ended, not by the total duration. In incident management, this means a 4-hour incident with a calm, organized resolution feels better to participants than a 1-hour incident with a chaotic, stressful ending. This is why graceful incident closure procedures — confirming resolution, thanking participants, scheduling a postmortem — matter disproportionately.
Automation complacency is a documented risk factor in every safety-critical industry¶
When systems are highly automated, human operators lose situational awareness and skill proficiency — a phenomenon called "automation complacency" or the "ironies of automation" (documented by Lisanne Bainbridge in 1983). The more reliable the automation, the less prepared the human is when it fails. In ops, this manifests as teams that cannot debug systems manually because they've never had to — until the automation itself is what's broken.