MES Software: Vendors, Features & Costs Compared 2026
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
Machine downtime is any interval in which a production asset is not producing good parts while it was scheduled to. The definition sounds trivial, and that is exactly why most plants measure it wrong. The word "downtime" is used interchangeably for three very different things — planned stops, unplanned failures and micro-stops below the radar — and each has a different root cause, a different owner and a different fix.
In OEE terms, machine downtime is the biggest single contributor to the Availability factor. In practical terms, it is usually the largest hidden cost in a manufacturing plant. I have spent 30 years connecting machines to higher-level systems, starting with Simatic S5 in 1991 and arriving at OPC UA and IoT gateways today. The constant over all those years has been this: operators and plant managers systematically underestimate how much their machines actually stand still, because the numbers they work with come from paper, memory or ERP back-flushing — never from the machine itself.
The first hour after a real-time machine connection goes live at a new customer is always the same. The dashboard shows more stops than anyone expected, shorter runs than the work instructions assume, and micro-stops that nobody knew existed. That hour is what this article is really about.
Most plants manage one category competently, tolerate the second, and are completely blind to the third.
| Category | Trigger | Typical duration | Usually visible? |
|---|---|---|---|
| Planned downtime | Changeovers, preventive maintenance, scheduled breaks | Minutes to hours | Yes — scheduled in advance |
| Unplanned downtime | Breakdowns, tool failures, material shortages, quality holds | Minutes to days | Partially — large events only |
| Micro-stops (idling) | Jams, sensor triggers, minor adjustments, waiting for operator | Seconds to a few minutes | Almost never, without automation |
The third category is the one that decides whether a plant's OEE improvement programme succeeds. In typical assembly and packaging operations, micro-stops account for 15–30 % of all availability loss — and in almost every project I have run, the plant's baseline number for micro-stops was either "zero" or "negligible". Both were wrong by an order of magnitude.
Nakajima's Six Big Losses framework maps cleanly onto downtime categories. The breakdown is useful because it turns a vague "we have too much downtime" into a structured problem with specific owners.
| Loss | Downtime type | Primary countermeasure |
|---|---|---|
| 1. Equipment breakdowns | Unplanned | TPM, predictive maintenance |
| 2. Setup & adjustments | Planned | SMED, standardised changeovers |
| 3. Minor stops / idling | Micro-stops | Automatic capture, root-cause analysis |
| 4. Reduced speed | Hidden losses (not classical downtime, but performance loss) | Process engineering, condition monitoring |
The key insight: categories 1 and 2 are visible, category 3 almost never is. Plants that reach world-class OEE are the ones that stopped fighting only the visible losses.
Two metrics translate machine downtime from an operational nuisance into a quantified engineering problem. Both are derived directly from the timestamps of start-stop-start cycles — which is exactly what a PLC or an IoT gateway records automatically.
MTBF = Total operating time ÷ Number of failures
Mean Time Between Failures — reliability indicator
MTTR = Total downtime ÷ Number of failures
Mean Time To Repair — maintainability indicator
A rising MTBF means the machine runs longer between stops — maintenance and equipment condition are improving. A falling MTTR means each stop is resolved faster — response time, spare-part availability and operator competence are improving. You need both trends moving in the right direction. Improving MTBF while MTTR stagnates means you are buying availability with maintenance overtime; improving MTTR while MTBF deteriorates means you are getting good at firefighting a burning building.
The canonical list of "causes of downtime" in textbooks is correct but unhelpful. In 30 years of brownfield connectivity projects, the real causes cluster differently:
| Cause cluster | What it looks like in practice | Typical share |
|---|---|---|
| Upstream / downstream starvation | Machine runs fine but waits for material, parts, or the next station | 20–35 % |
| Changeovers & setups | Tool changes, format changes, recipe changes | 10–25 % |
| Mechanical failures | Breakdowns, tool wear, tolerance issues | 15–25 % |
| Quality holds & rework | Line stopped for inspection or correction of defects | 5–15 % |
| Operator-related | Breaks, shift handover, manual intervention, missing operator | 10–20 % |
| IT / control system | PLC faults, network issues, software bugs | 2–8 % |
The surprise for most plant managers is the first row. People assume downtime is dominated by breakdowns, because those are loud. In practice, starvation and upstream/downstream issues usually win — and they are the ones that never end up in the maintenance log.
Paper-based reason codes and end-of-shift logs miss three things systematically. This is the single biggest reason that OEE numbers from "our SCADA already has it" are usually 10–20 percentage points too optimistic.
| Blind spot | Why paper can't catch it |
|---|---|
| Micro-stops under 5 minutes | Not worth writing down — but 200 of them per shift kill the day |
| Chronic short failures with the same root cause | Logged separately, never correlated — the pattern is invisible |
| Reason codes attributed post hoc | End-of-shift reconstruction is shaped by memory and politics, not data |
In one packaging plant where we installed an IoT gateway on a line that was "well understood" by its maintenance team, the first week revealed 412 stops below 90 seconds — none of which appeared in any existing report. The sum of those micro-stops was larger than the recorded "big" downtime for the same period.
The right connection method depends on the machine's age and control system. After hundreds of brownfield integrations, the pattern is clear: there is a suitable option for every machine, no matter how old. "Our machines can't deliver data" is almost never true in 2026.
| Machine type | Connection method | Typical effort per machine |
|---|---|---|
| Modern PLC (S7-1500, TIA, Beckhoff, Rockwell) | OPC UA server, read-only access to alarm and state tags | 2–4 hours |
| Older PLC (S7-300/400, S5) | Edge gateway with protocol adapter, no PLC change | 2–4 hours |
| No PLC / relay-logic machines from the 1980s–90s | Digital-I/O gateway tapping cycle signals and status lamps | 2–4 hours |
| Standalone machines (no network infrastructure) | IoT gateway with GSM/4G uplink, no LAN needed | 2–4 hours |
In every case: no PLC reprogramming, no CE re-certification, no production interruption. That is the non-negotiable rule when connecting brownfield equipment — touch the machine's logic and you inherit it, which no plant wants.
| Step | Action | Typical reduction in stop time |
|---|---|---|
| 1 | Automatic capture of every stop with timestamp — no interpretation yet | Baseline only, but the baseline is finally real |
| 2 | Classify stops via PLC alarms, not operator input | Reveals Pareto of true causes, often different from what people believed |
| 3 | Attack the top three causes with dedicated Kaizen teams | 20–40 % in 8–12 weeks |
| 4 | Introduce autonomous maintenance on the cleanest line | 10–20 % additional |
| 5 | SMED for the worst changeovers | 30–60 % on targeted changeovers |
| 6 | Predictive maintenance on bottleneck equipment | 5–15 % additional on critical machines |
The order matters. Steps 3–6 without step 1 produce improvement theatre: you optimise the wrong thing, because the baseline was never real. Step 1 alone, without steps 3–6, produces dashboards and no change.
| Without MES | With SYMESTIC MES |
|---|---|
| Operators write reason codes at shift end | Stops detected automatically, PLC alarm attached to each event |
| Micro-stops invisible | Every stop > 2 seconds captured and categorised |
| MTBF / MTTR calculated manually, monthly | Live per machine, per failure mode, with trend |
| Alarm correlation impossible | Alarms tied to downtime events and quality defects |
| Maintenance informed after the stop | Notification within seconds, mobile alert with machine state |
The Neoperl reference case is a representative example: PLC-triggered stop detection, machines documenting their own technical downtime without operator intervention, and correlation of specific alarms with quality defects. Result: 10 % fewer stops, 8 % higher availability, 15 % less scrap. Those numbers are not unusual — they are what a realistic downtime programme produces in the first year, once the measurement is honest.
What counts as "downtime" in OEE?
In the strict ISO 22400 and Nakajima definitions, downtime is any loss of scheduled run time due to stops. This explicitly includes planned stops like changeovers and preventive maintenance — they count against Availability even though they are scheduled. Reduced-speed running does not count as downtime; it is a Performance loss. The practical boundary is cleaner: if the machine is not producing good parts while it was supposed to, it is some form of loss — and downtime is the subset where the machine is actually stopped. Plants that exclude "planned" stops from their downtime number are not wrong, but they should stop calling the result "OEE"; it is closer to a Technical Efficiency metric.
How small a stop is a "micro-stop"?
Convention varies, but the working definition in TPM circles is any stop below 5 minutes. Some plants tighten that to below 2 minutes. The threshold matters less than the principle: below the threshold, operators are not expected to document the stop manually, which means any reporting depends entirely on automatic capture. In lines with high cycle frequency (packaging, assembly), individual micro-stops of 10–30 seconds can aggregate to 2–3 hours per shift without anyone noticing. The first time a line's true micro-stop total is displayed on a dashboard is usually the moment the improvement programme becomes real.
Do I need to re-certify the machine if I add a downtime gateway?
No — and this is the single biggest fear that blocks brownfield connectivity projects. A gateway that reads signals passively (OPC UA in read-only mode, digital I/O via tap, MQTT subscriber) does not modify the machine's control logic, does not change its safety behaviour and does not alter its CE-relevant functions. CE re-certification is only required when you modify the machine in a way that affects its conformity assessment — which read-only data capture does not. The gateway sits in the OT network as an observer, not an actor. In the hundreds of integrations I have done, no CE case has ever been triggered by downtime capture.
How fast can real-time downtime monitoring go live?
For a first line with a modern PLC, counting from the kick-off workshop: dashboards with live cycle-time, stop detection and OEE are realistic in one to two weeks. For a brownfield line with mixed vintages, plan two to four weeks per line, with the bulk of the time going into signal definition and stop classification, not hardware installation. The Klocke rollout — all packaging lines at Weingarten connected via digital-I/O gateways in three weeks — is representative of what is achievable when the approach is right. The barrier is almost never the technology; it is the organisational decision to start.
What is the ROI of automatic downtime capture?
The honest answer: it depends almost entirely on baseline OEE. A plant running at 80 % OEE has less headroom than one at 55 %. Across the customer base I have worked with, typical first-year results are 5–10 % availability gain, of which roughly half comes from eliminating previously invisible micro-stops and half from faster response to large stops. At a typical machine with €50–150 per hour of production value, the payback on the gateway and MES subscription is usually three to six months — before any deeper TPM or SMED work is even started. The business case is not made by saving maintenance hours; it is made by producing more good parts in the same calendar time.
Related: OEE · TPM · SMED · MDE · BDE · Cycle Time · Alarms · Process Data
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
OEE software captures availability, performance & quality automatically in real time. Vendor comparison, costs & case studies. 30-day free trial.
MES (Manufacturing Execution System): Functions per VDI 5600, architectures, costs and real-world results. With implementation data from 15,000+ machines.