MES Software: Vendors, Features & Costs Compared 2026
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
MTBF (Mean Time Between Failures) is the average operating time between two consecutive unplanned failures of a repairable system. If a press line runs for 400 hours total in a month and fails 4 times, the MTBF is 100 hours — meaning, on average, you can expect 100 hours of operation before the next failure. MTBF is the single most important metric for equipment reliability in manufacturing. It directly drives the Availability factor of OEE, determines maintenance intervals, and separates reactive firefighting from data-driven maintenance management. MTBF does not tell you how long a repair takes (that is MTTR) — it tells you how often you need to repair.
The formula is straightforward. The difficulty is never the math — it is getting accurate data for the inputs.
MTBF = Total Operating Time / Number of Failures
Where:
| Worked example | Value |
|---|---|
| Planned production time (1 month, 3 shifts) | 480 hours |
| Planned downtime (maintenance, changeovers) | 40 hours |
| Unplanned downtime (all failures combined) | 40 hours |
| Total operating time (480 − 40 planned − 40 unplanned) | 400 hours |
| Number of unplanned failures | 8 |
| MTBF (400 / 8) | 50 hours |
| MTTR (40 hours unplanned downtime / 8 failures) | 5 hours |
| Availability (MTBF / (MTBF + MTTR)) = 50 / 55 | 90.9 % |
The critical insight: Availability is a function of MTBF and MTTR together. You can improve Availability by increasing MTBF (failing less often — reliability improvement) or decreasing MTTR (repairing faster — maintenance efficiency). The best plants attack both simultaneously.
| Metric | Full name | Applies to | What it measures | Formula |
|---|---|---|---|---|
| MTBF | Mean Time Between Failures | Repairable systems (machines, lines) | Average running time between two consecutive failures | Total operating time / Number of failures |
| MTTR | Mean Time To Repair | Repairable systems | Average time from failure to restored operation | Total repair time / Number of failures |
| MTTF | Mean Time To Failure | Non-repairable components (bearings, seals, light bulbs) | Average time until the component fails and is replaced | Total operating time / Number of units that failed |
The distinction matters: MTBF is for machines you repair and put back into service. MTTF is for components you discard and replace. A press has an MTBF. The bearing inside the press has an MTTF. When the bearing fails, the press fails — so the bearing's MTTF directly influences the press's MTBF. Predictive maintenance uses individual component MTTF data to prevent the machine-level MTBF from degrading.
The bathtub curve describes how failure rate changes over the life of a machine or component. It has three phases:
| Phase | Failure rate | What happens | MTBF implication |
|---|---|---|---|
| Infant mortality (early life) | High, decreasing | Manufacturing defects, installation errors, incorrect settings. A new machine or component fails more often in the first weeks. | MTBF is low initially. If you calculate MTBF only during commissioning, you get a misleadingly pessimistic number. |
| Useful life (constant) | Low, stable | Random failures at a low, constant rate. This is the normal operating phase. | MTBF is at its highest and most stable. This is the phase where MTBF is a meaningful reliability indicator. |
| Wear-out (end of life) | High, increasing | Fatigue, corrosion, wear. Components reach the end of their design life. | MTBF drops. If you see MTBF declining over months, the machine is entering wear-out phase — and preventive replacement (not more repair) is the correct response. |
The practical lesson: monitoring MTBF trends over time — not just the current value — tells you which phase a machine is in. An MES that tracks MTBF per machine over months and years is the only reliable way to detect the transition from useful life to wear-out before it causes cascading failures.
MTBF requires two inputs: operating time and failure count. Both are trivially simple in theory and chronically inaccurate in practice — because of how they are collected:
The SYMESTIC alarms module captures every PLC alarm with timestamp, duration and alarm code — the raw data from which MTBF is calculated automatically, per machine, per shift, per week, per month. The production metrics module turns that data into the MTBF trend chart that maintenance managers need: "Machine 5 MTBF dropped from 120 hours to 65 hours over the last 8 weeks — the hydraulic unit is entering wear-out phase."
| Maintenance strategy | How it uses MTBF | Limitation | MES role |
|---|---|---|---|
| Reactive (run to failure) | Does not use MTBF — waits until something breaks | Maximises unplanned downtime, highest repair cost | MES reveals the true cost of reactive maintenance by quantifying MTTR per failure type |
| Preventive (time-based) | Sets maintenance interval at a fraction of MTBF (e.g., maintain every 80 % of MTBF) | Over-maintains if MTBF is underestimated; under-maintains if overestimated | MES provides accurate MTBF per machine — not manufacturer spec, but actual field data |
| Condition-based | Uses MTBF trend + process parameters to trigger maintenance when degradation is detected | Requires sensor data and pattern recognition | MES process data module provides temperature, pressure, vibration trends that correlate with MTBF decline |
| Predictive | Uses historical MTBF data + ML models to predict when the next failure will occur | Requires 6–12 months of clean historical data | MES builds the historical data foundation that predictive models require — automatically, from day one of operation |
MTBF feeds directly into OEE Availability: Availability = MTBF / (MTBF + MTTR). If MTBF is 50 hours and MTTR is 5 hours, Availability is 90.9 %. To reach 95 % Availability with the same MTTR, MTBF must increase to 95 hours. The MES calculates both MTBF and MTTR from the same machine-state data — and shows maintenance managers exactly where to focus: "Machine 5 has the lowest MTBF in the plant (50 hours). The top failure cause is alarm #3012 (hydraulic pressure). Fixing the root cause of #3012 would increase MTBF to an estimated 85 hours and Availability from 90.9 % to 94.4 %."
What is a good MTBF value?
There is no universal benchmark. MTBF depends entirely on the machine type, age, operating conditions and maintenance regime. A stamping press with MTBF of 200 hours is good; a packaging line with MTBF of 200 hours might be poor — because packaging lines typically have more components and higher expected reliability. The meaningful comparison is: your machine's MTBF this month vs. last month vs. last quarter. Is it improving, stable or declining? That trend is more valuable than any industry benchmark. At Meleghy Automotive, the SYMESTIC MES enabled exactly this comparison — MTBF per press, per plant, tracked over time.
Should I measure MTBF per machine or per production line?
Both — but for different purposes. Machine-level MTBF tells maintenance where to focus repair and replacement. Line-level MTBF tells production planning how often the line will stop. Line MTBF is always lower than individual machine MTBF — because if any machine in the line fails, the line stops. For a line with 5 machines each at 200 hours MTBF, the line MTBF is approximately 40 hours (1/MTBF_line ≈ 1/200 + 1/200 + 1/200 + 1/200 + 1/200). This is why bottleneck machines with the lowest MTBF dominate line performance.
How does MTBF relate to TPM?
TPM (Total Productive Maintenance) uses MTBF as its primary reliability metric. TPM Pillar 3 (Planned Maintenance) sets maintenance intervals based on MTBF data. TPM Pillar 4 (Training & Education) uses MTBF comparisons across operators/shifts to identify skill gaps. Without accurate MTBF data, TPM is theory. With MES-based MTBF tracking, TPM becomes a data-driven system.
Can MTBF be too high?
Yes — if it is artificially inflated. Two common causes: (1) operators do not log short stops, so the failure count is too low and MTBF appears higher than reality; (2) excessive preventive maintenance replaces components before they would ever fail, consuming maintenance budget without improving reliability. The MES eliminates cause (1) by counting every stop automatically. For cause (2), MTBF trending shows whether preventive intervals can be safely extended — saving maintenance cost without increasing failure risk.
Related: MTTR (Mean Time To Repair) · OEE Explained · TPM · Predictive Maintenance · SYMESTIC Alarms Module · SYMESTIC Production Metrics · MES: Definition & Functions
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
OEE software captures availability, performance & quality automatically in real time. Vendor comparison, costs & case studies. 30-day free trial.
MES (Manufacturing Execution System): Functions per VDI 5600, architectures, costs and real-world results. With implementation data from 15,000+ machines.