MES Software: Vendors, Features & Costs Compared 2026
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
A packaging line was throwing a "no-bottle-detected" alarm three or four times per shift. The maintenance team had replaced the photoelectric sensor twice. The third time it happened, somebody finally asked me to look at it. Within twenty minutes we knew it wasn't the sensor — it was a slow air leak in a pneumatic actuator that, once per cycle, let a bottle drift two millimetres outside the sensor's detection window. The leak had been audible for weeks. Nobody connected it to the alarms because nobody asked "why" enough times.
That is 5 Whys. The whole methodology. Strip away the Toyota history, the lean-manufacturing branding, the workshop templates, and what's left is the discipline that every competent troubleshooter has used since the first machine broke down: don't accept the first plausible cause. Keep going. The first cause is almost always a symptom of a deeper one, and the deeper one is what you actually need to fix if you don't want to be back here tomorrow.
I have spent thirty-five years connecting machines that were never designed to be connected, and a large fraction of that work is debugging — figuring out why something is doing what it shouldn't. The 5 Whys discipline is the most useful single technique I have, and it is also one of the most consistently misapplied in the field. This article is what I have learned about where the chain actually works, where it reliably breaks, and what makes the difference between investigation and guided storytelling.
5 Whys is a sequential causal-analysis technique in which an investigator (or a small team) starts with an observed problem and repeatedly asks "why did that happen?" — using each answer as the input to the next question — until the chain terminates at a cause that, if eliminated, would prevent the original problem from recurring. The number five is a heuristic, not a rule. Sometimes the chain terminates at three. Sometimes it needs eight. The discipline is in not stopping at the first answer that sounds satisfying.
Sakichi Toyoda formalised the technique inside what became the Toyota Production System. The version that gets taught in Six Sigma Black Belt programmes is more rigorous than the version most engineers actually use. The version most engineers actually use is closer to common sense applied with discipline — and the failure mode is almost always the discipline part, not the technique itself.
Let me walk through the bottle-detection example properly, with the data I had at each step. This is what an honest 5 Whys looks like — not the polished version that ends up in a quality report, but the actual sequence of questions, answers, and verifications:
The sensor replacements that the maintenance team had done weren't wrong actions in isolation — the sensor really was returning intermittent readings. They were wrong actions because the chain stopped at W1. If you stop at W1, you replace sensors forever. If you go to W5, you replace one €8 seal and the problem stops.
The other thing worth noticing: each step in the chain has a verification line. That is the part most 5 Whys exercises skip. An unverified answer is not a step in an investigation — it is a guess that becomes the basis for the next guess, which becomes the basis for the one after that. By W3 the whole chain is fiction, and the team feels good because they got to a "root cause" that sounded plausible.
Out of every ten 5 Whys exercises I have either run or reviewed in the field, the failure mode is one of these five, in roughly this frequency order:
This is the part that most articles on the topic skip. 5 Whys is excellent for relatively simple, linear, single-cause failures — the bottle-detection alarm above is a textbook example. It is poor for problems that are statistical (this defect occurs in 2% of cycles with no obvious pattern), problems with multiple interacting causes, problems that emerge only under specific environmental conditions, and problems where the "cause" is a slow drift rather than an event.
For statistical problems, the right tool is SPC and capability analysis — you cannot ask "why did this defect occur" if the defect is a tail of a distribution rather than a discrete event. For multi-cause problems, Ishikawa or fault-tree analysis. For environmental problems, controlled testing under varied conditions. For drift problems, trend analysis on historical data. The right answer to "should we do a 5 Whys here?" is sometimes "no, we should do something else." Engineers who reach for 5 Whys regardless of problem type usually produce a clean-looking analysis that addresses the wrong thing.
Most of the 5 Whys exercises I run aren't on the production process itself — they're on the data infrastructure I am trying to build on top of it. Why isn't this signal where it should be. Why is this OPC tag returning a value that doesn't match the operator panel. Why is the cycle counter incrementing at half the rate I expected. The technique is exactly the same as on the production-process side, but the verifications are different — multimeter readings, network captures, PLC traces, OPC client logs.
What I have learned over thirty-five years of doing this kind of debugging is that the technique scales down to a single engineer working alone on a single problem, and it scales up to a cross-functional team working on a recurring quality issue. The discipline is the same at both ends. The thing that determines whether the chain reaches a real cause or a comfortable fiction is not the size of the team or the formality of the documentation. It is whether each "why" gets answered by someone who actually knows, or someone who is plausibly guessing.
The reason 5 Whys works in environments where SYMESTIC has been deployed and breaks down in environments where it hasn't is, in my experience, almost always the verification problem from the list above. When the production data exists — per-cycle process data in Process Data, time-stamped event history in Alarms, real KPI history in Production Metrics — each "why" in the chain can be verified against actual measurement at the actual time. When it doesn't, every "why" past the second one is a team consensus about what probably happened. Both produce documents. Only one produces fixes that hold. None of this replaces the discipline of the technique itself; it just removes the constraint that pushes well-intentioned investigations into guessing instead of answering.
Related problem-solving and quality topics: OEE · MES · 8D Report · SCAR · Ishikawa diagram · Fault tree analysis · DMAIC · PDCA cycle · A3 report · Statistical Process Control · Kaizen · Lean production.
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
OEE software captures availability, performance & quality automatically in real time. Vendor comparison, costs & case studies. 30-day free trial.
MES (Manufacturing Execution System): Functions per VDI 5600, architectures, costs and real-world results. With implementation data from 15,000+ machines.