MES Software: Vendors, Features & Costs Compared 2026
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
TL;DR: Downtime is any period during planned production time when an asset is not producing good parts at target rate. It splits into planned (changeovers, PM, meetings) and unplanned (breakdowns, material starvation, quality holds, micro-stops), and its honest measurement is the single largest lever on OEE. The arithmetic is trivial. The hard part is that almost every plant that measures downtime for the first time through an MES discovers the real number is two to three times what the old paper-based estimate suggested — and that 30 to 50 percent of the losses hide in micro-stops shorter than two minutes that manual logging never captures at all.
Downtime is any interval within scheduled production time during which a machine, line or asset is not producing good parts at the specified rate. It is the headline loss category in every production system that is measured seriously, and it is the direct enemy of availability — the first of the three OEE factors. ISO 22400 defines the relevant time frames (planned busy time, actual production time, down time) and VDI 3423 provides the German-language vocabulary used across most mid-market plants in the DACH region. Both frameworks agree on the essentials; they disagree, productively, on edge cases that every plant has to resolve for itself.
What makes downtime different from every other production KPI is how unevenly it is distributed. A small number of stop reasons — typically three to five — cause the majority of the lost minutes. The rest is a long tail of one- and two-minute micro-stops that feel insignificant individually and add up to the single largest hidden loss in most discrete manufacturing operations. Anyone who has spent time on a shop floor has seen the pattern. The supervisor knows the two machines that fail every week. Nobody knows about the 14-second feeder jams that happen 80 times per shift.
The classification most plants work with is simpler than the standards suggest and stricter than most operators initially like. Two axes, four cells, and a clear rule for the one category that always causes arguments.
| Category | Typical stop reasons | Counted against OEE? |
|---|---|---|
| Planned, not-scheduled | Weekend shutdowns, holidays, no demand | No — excluded from planned production time |
| Planned, scheduled | Changeovers, PM, shift meetings, scheduled breaks | Yes — the honest view |
| Unplanned — availability loss | Breakdowns, material starvation, tool changes mid-run, operator absence | Yes |
| Unplanned — performance loss (micro-stops) | Feeder jams, idle between orders, speed drops, chuta/chute blockages < 2 min | Yes — usually against performance, not availability |
The category that always causes arguments is "planned, scheduled" — specifically, whether changeovers and planned maintenance should count against OEE. The theoretical answer, and the one that produces numbers useful for continuous improvement, is yes. If changeovers are excluded from planned production time, the OEE number looks better and every SMED improvement becomes invisible. If they are included, the number reflects reality and the improvement potential is legible. I have had this argument in more plant meetings than I can count. The plants that include changeovers in the denominator are the ones that end up reducing them.
The classical TPM framework — still the cleanest way to think about where downtime and performance losses come from — identifies six categories. Memorising them is worth the fifteen minutes it takes, because every real-world stop reason maps cleanly to one of them.
The losses that show up on a supervisor's radar are almost exclusively categories 1 and 2. The ones that kill OEE in a line that already handles breakdowns well are categories 3, 4 and 6 — precisely the ones that humans cannot reliably count without automated measurement.
This is the pattern I have watched in several hundred plants across three decades, and it has never failed to appear. A plant estimates its downtime at eight to ten percent of planned production time, based on manual logs filled in at shift end. Automatic measurement goes live. The first honest number comes back at 22 to 28 percent. There is a meeting. Someone accuses the MES of miscounting. The MES is not miscounting. The plant was.
The mechanism is not fraud and it is not incompetence. It is the interaction of three honest phenomena that compound against each other:
The honest rule we have arrived at after 15,000+ machine connections: the first automated downtime baseline is always 2 to 3 times the number the plant believed. That is not a failure of the new measurement. It is the first honest baseline the plant has ever had, and real improvement begins from there. In the Meleghy rollout — six plants across four countries — the same pattern showed up in every site, and the plants that accepted the new number rather than arguing with it delivered 10 percent fewer stoppages within six months.
The capture method matters far more than the analytics layered on top of it. Three patterns dominate, and the quality of every subsequent decision depends on which one a plant is using.
Fully automatic capture. Machine state read from the PLC via OPC UA, or from digital I/O signals wired to a gateway on older equipment. Every state transition produces a timestamped event in the MES. Operator's only job is to classify the reason — ideally from a short, deliberately constrained list of 8 to 12 codes rather than a 40-item drop-down nobody reads. This is the only capture method that picks up micro-stops at all, and it is the pattern that dominates the SYMESTIC installed base for exactly that reason.
Semi-automatic capture. Duration and timing come from the machine; reason codes come from operator input at a shop-floor terminal. Acceptable for most discrete manufacturing, and often the pragmatic compromise on older equipment where a full PLC tap is impractical. The data quality sits roughly 80 percent of the way from paper to fully automatic — good enough for meaningful OEE, not quite good enough for root-cause work on the long tail.
Manual / paper-based capture. Operator writes stop events on a shift sheet; supervisor transcribes into a system the following morning. The resulting data is useful for broad trend awareness and nothing else. Any plant still working this way has an unknown real downtime figure, and the first investment worth making is not better analytics — it is the gateway that turns the machine into a source of truth.
The useful ranges below reflect what actually shows up in discrete manufacturing when downtime is measured automatically against true planned production time — including changeovers, including micro-stops. Ranges for plants that use cosmetic denominators will always look better and mean less.
| Plant maturity | Unplanned downtime share | Corresponding availability |
|---|---|---|
| Reactive | 25–40 % | 60–75 % |
| Transitional | 15–25 % | 75–85 % |
| Mature | 7–15 % | 85–93 % |
| World-class | < 7 % | > 93 % |
Any figure below 5 percent in a genuinely discrete manufacturing environment should be audited. It is not impossible, but in a population of several hundred plants it is rare enough that the first assumption should be measurement-system flattery rather than genuine excellence.
The sequence below is the one that works in the field, not the textbook version that starts with "implement a culture of continuous improvement." Culture matters, but culture on top of bad data produces confidently wrong decisions.
A realistic expectation for a plant that commits honestly to this sequence: 20 to 35 percent reduction in unplanned downtime within 12 months, with the biggest gains in the first 90 days coming from the Pareto work on the top three stop reasons alone.
In the SYMESTIC deployment pattern, machine states flow into production KPIs directly from the controller or from an I/O gateway on brownfield equipment. Every state transition produces a timestamped event with sub-second resolution, which is what makes micro-stop detection possible at all. The alarms module structures stop-reason codes into a deliberately short, operator-friendly list; the process data module supplies the parameter context at the moment of each stop. The authoritative vocabulary comes from ISO 22400 (manufacturing KPIs), VDI 3423 (availability) and SEMI E10 (equipment reliability and availability) — the documents are easy to find by name and worth reading in full by anyone designing a measurement system from scratch.
What is the difference between downtime and idle time?
Downtime is a period during planned production time when an asset should be producing and is not. Idle time, in the precise sense, is time outside planned production time — the asset is not scheduled to run. Idle time does not count against OEE; downtime does. The terminology is used loosely in everyday shop-floor conversation, which is why written stop-reason taxonomies matter.
Do planned maintenance and changeovers count as downtime?
Yes, if they occur during planned production time. The temptation to exclude them — because they are "planned" — is strong and counterproductive. A plant that excludes changeovers from its denominator will never measure SMED improvements and will never have an honest OEE. Include them, measure them, attack them.
What is a micro-stop?
Conventionally, any stop shorter than two to five minutes (definitions vary). Micro-stops are functionally invisible to manual logging and are typically the single largest category of hidden loss in automated lines. ISO 22400 treats them as performance losses rather than availability losses, which changes where they appear in OEE but not how much capacity they consume. Practically, they are where the interesting improvement potential lives once the big breakdowns are under control.
How is downtime connected to OEE?
Directly. Availability — one of the three OEE factors — is operating time divided by planned production time, which is equivalent to (planned production time minus downtime) divided by planned production time. Reducing unplanned downtime raises availability one-for-one. See OEE and Availability for the full treatment.
Why does our OEE drop when we install an MES?
Because the previous number was wrong. Automated capture picks up the micro-stops and speed losses that manual logging missed. The 2–3× rise in reported downtime during the first 90 days of MES-based capture is the industry norm across 15,000+ machine connections. The honest baseline is the starting point of real improvement; the flattering old number was not.
How short is "too short" for a stop to count?
Any stop long enough to interrupt cycle time counts in principle. In practice, most plants set a threshold at 10 to 30 seconds for micro-stop classification, below which events are treated as within-cycle variation rather than stops. The threshold should be written down and stable over time. Moving it is a common way to accidentally make the number look better.
Is zero downtime a realistic target?
No, and the plants that get closest are not the ones that set it as a target. The realistic targets are planned downtime reduced to what is necessary for maintenance and changeover, and unplanned downtime reduced to a level consistent with equipment reliability and supply stability. World-class discrete manufacturing sits below 7 percent unplanned downtime; below that, the marginal cost of each additional percentage point rises steeply, and the investment is often better spent elsewhere.
Related: OEE · Availability · MTBF · MTTR · Machine Data Acquisition · Planned Maintenance Percentage · Preventive Maintenance · SMED · Six Big Losses · Alarms.
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
OEE software captures availability, performance & quality automatically in real time. Vendor comparison, costs & case studies. 30-day free trial.
MES (Manufacturing Execution System): Functions per VDI 5600, architectures, costs and real-world results. With implementation data from 15,000+ machines.