Skip to content

5 Whys: Where the Chain Actually Breaks

By Martin Brandel · Last updated: April 2026

A packaging line was throwing a "no-bottle-detected" alarm three or four times per shift. The maintenance team had replaced the photoelectric sensor twice. The third time it happened, somebody finally asked me to look at it. Within twenty minutes we knew it wasn't the sensor — it was a slow air leak in a pneumatic actuator that, once per cycle, let a bottle drift two millimetres outside the sensor's detection window. The leak had been audible for weeks. Nobody connected it to the alarms because nobody asked "why" enough times.

That is 5 Whys. The whole methodology. Strip away the Toyota history, the lean-manufacturing branding, the workshop templates, and what's left is the discipline that every competent troubleshooter has used since the first machine broke down: don't accept the first plausible cause. Keep going. The first cause is almost always a symptom of a deeper one, and the deeper one is what you actually need to fix if you don't want to be back here tomorrow.

I have spent thirty-five years connecting machines that were never designed to be connected, and a large fraction of that work is debugging — figuring out why something is doing what it shouldn't. The 5 Whys discipline is the most useful single technique I have, and it is also one of the most consistently misapplied in the field. This article is what I have learned about where the chain actually works, where it reliably breaks, and what makes the difference between investigation and guided storytelling.

What 5 Whys is, in one honest definition

5 Whys is a sequential causal-analysis technique in which an investigator (or a small team) starts with an observed problem and repeatedly asks "why did that happen?" — using each answer as the input to the next question — until the chain terminates at a cause that, if eliminated, would prevent the original problem from recurring. The number five is a heuristic, not a rule. Sometimes the chain terminates at three. Sometimes it needs eight. The discipline is in not stopping at the first answer that sounds satisfying.

Sakichi Toyoda formalised the technique inside what became the Toyota Production System. The version that gets taught in Six Sigma Black Belt programmes is more rigorous than the version most engineers actually use. The version most engineers actually use is closer to common sense applied with discipline — and the failure mode is almost always the discipline part, not the technique itself.

A worked chain, the way it actually goes

Let me walk through the bottle-detection example properly, with the data I had at each step. This is what an honest 5 Whys looks like — not the polished version that ends up in a quality report, but the actual sequence of questions, answers, and verifications:

› the chain
P: "no-bottle-detected" alarm 3–4× per shift
W1: why? — sensor occasionally doesn't see the bottle
verified: alarm log + sensor I/O signal in production data
W2: why? — bottle position varies cycle-to-cycle by ≈2 mm
verified: 30-cycle position measurement with mechanical gauge
W3: why? — upstream pneumatic gripper releases at inconsistent angle
verified: high-speed camera, 50 cycles, gripper end-position scatter
W4: why? — gripper actuator pressure drops during release stroke
verified: pressure logging on actuator supply line, 200 ms window
W5: why? — small leak in seal of actuator cylinder
verified: soap-bubble test on cylinder, audible hiss confirmed
→ root cause: failed actuator seal. fix: replace seal (€8 part).
→ recurrence: zero in 6 months of monitoring.

The sensor replacements that the maintenance team had done weren't wrong actions in isolation — the sensor really was returning intermittent readings. They were wrong actions because the chain stopped at W1. If you stop at W1, you replace sensors forever. If you go to W5, you replace one €8 seal and the problem stops.

The other thing worth noticing: each step in the chain has a verification line. That is the part most 5 Whys exercises skip. An unverified answer is not a step in an investigation — it is a guess that becomes the basis for the next guess, which becomes the basis for the one after that. By W3 the whole chain is fiction, and the team feels good because they got to a "root cause" that sounded plausible.

The five places where the chain reliably breaks

Out of every ten 5 Whys exercises I have either run or reviewed in the field, the failure mode is one of these five, in roughly this frequency order:

  1. Stopping at the first plausible answer. Most common by a wide margin. The first "why" produces an answer that *could* be the cause, and the team accepts it because it ends the conversation. The actual cause is usually one to three layers deeper. Counter-discipline: when you reach an answer that feels satisfying, that's the moment to ask one more "why," not the moment to stop.
  2. Terminating at the human. "Operator error." "Maintenance forgot." "Wrong setting entered." These are categories, not causes. They terminate the investigation in a way that absolves the system of responsibility and puts it on a person. A correctly performed five-whys exercise almost never ends at a human. It ends at the missing standard, the unclear instruction, the absent verification step, the design that allowed the human action to cause the problem. If your chain ends at the operator, it isn't finished.
  3. Branching that gets collapsed. Real causal chains often branch — a single problem has two contributing causes that interact, and at some "why" you need to pursue both branches in parallel. The 5 Whys format is linear, which tempts investigators to pick whichever branch feels more important and drop the other. The dropped branch is usually where the next failure comes from. When this happens, abandon strict 5 Whys and switch to an Ishikawa diagram or fault tree — the technique has reached its limit, and the investigation hasn't.
  4. Asking "why" without data to answer it. This is the failure mode I see most often in workshop-style 5 Whys done in conference rooms instead of on the shopfloor. The team gets to W2 or W3 and starts answering with conjectures because nobody on the team actually knows. The answers sound right, the chain reaches a tidy conclusion, and the corrective actions target a cause that may or may not be real. The fix: do not write down a "why" answer that nobody on the team can verify with current data, current measurement, or a quick experiment. If you can't verify, the next step is to go get the data, not to guess.
  5. Confusing "earliest event" with "root cause." Sometimes investigators trace a chain back in time until they reach an event that happened years ago — an original equipment specification, a procurement decision, a hiring choice — and stop there because they have run out of "whys" they can answer. That earliest event is the historical origin, but the actionable root cause is usually something more recent and more proximate. The test: a root cause is something whose elimination would have prevented this specific recurrence, and which can actually be eliminated. Everything earlier is context.

When 5 Whys is the wrong tool

This is the part that most articles on the topic skip. 5 Whys is excellent for relatively simple, linear, single-cause failures — the bottle-detection alarm above is a textbook example. It is poor for problems that are statistical (this defect occurs in 2% of cycles with no obvious pattern), problems with multiple interacting causes, problems that emerge only under specific environmental conditions, and problems where the "cause" is a slow drift rather than an event.

For statistical problems, the right tool is SPC and capability analysis — you cannot ask "why did this defect occur" if the defect is a tail of a distribution rather than a discrete event. For multi-cause problems, Ishikawa or fault-tree analysis. For environmental problems, controlled testing under varied conditions. For drift problems, trend analysis on historical data. The right answer to "should we do a 5 Whys here?" is sometimes "no, we should do something else." Engineers who reach for 5 Whys regardless of problem type usually produce a clean-looking analysis that addresses the wrong thing.

What this looks like in a brownfield environment

Most of the 5 Whys exercises I run aren't on the production process itself — they're on the data infrastructure I am trying to build on top of it. Why isn't this signal where it should be. Why is this OPC tag returning a value that doesn't match the operator panel. Why is the cycle counter incrementing at half the rate I expected. The technique is exactly the same as on the production-process side, but the verifications are different — multimeter readings, network captures, PLC traces, OPC client logs.

What I have learned over thirty-five years of doing this kind of debugging is that the technique scales down to a single engineer working alone on a single problem, and it scales up to a cross-functional team working on a recurring quality issue. The discipline is the same at both ends. The thing that determines whether the chain reaches a real cause or a comfortable fiction is not the size of the team or the formality of the documentation. It is whether each "why" gets answered by someone who actually knows, or someone who is plausibly guessing.

The reason 5 Whys works in environments where SYMESTIC has been deployed and breaks down in environments where it hasn't is, in my experience, almost always the verification problem from the list above. When the production data exists — per-cycle process data in Process Data, time-stamped event history in Alarms, real KPI history in Production Metrics — each "why" in the chain can be verified against actual measurement at the actual time. When it doesn't, every "why" past the second one is a team consensus about what probably happened. Both produce documents. Only one produces fixes that hold. None of this replaces the discipline of the technique itself; it just removes the constraint that pushes well-intentioned investigations into guessing instead of answering.


Related problem-solving and quality topics: OEE · MES · 8D Report · SCAR · Ishikawa diagram · Fault tree analysis · DMAIC · PDCA cycle · A3 report · Statistical Process Control · Kaizen · Lean production.

About the author
Martin Brandel
Martin Brandel
MES Consultant at SYMESTIC. 35+ years in industrial automation — Simatic S5/S7/TIA, OPC UA, IoT gateway integration, brownfield connectivity. PLC engineering and on-site commissioning across DACH, Eastern Europe, and China since 1991. Dipl.-Ing. Communications Engineering. · LinkedIn
Start working with SYMESTIC today to boost your productivity, efficiency, and quality!
Contact us
Symestic Ninja