Skip to content

MTBF: Formula, MTTR Comparison & MES-Based Calculation

By Martin Brandel · Last updated: April 2026

What is Mean Time Between Failures (MTBF)?

MTBF (Mean Time Between Failures) is the average operating time between two consecutive unplanned failures of a repairable system. If a press line runs for 400 hours total in a month and fails 4 times, the MTBF is 100 hours — meaning, on average, you can expect 100 hours of operation before the next failure. MTBF is the single most important metric for equipment reliability in manufacturing. It directly drives the Availability factor of OEE, determines maintenance intervals, and separates reactive firefighting from data-driven maintenance management. MTBF does not tell you how long a repair takes (that is MTTR) — it tells you how often you need to repair.

How do you calculate MTBF?

The formula is straightforward. The difficulty is never the math — it is getting accurate data for the inputs.

MTBF = Total Operating Time / Number of Failures

Where:

  • Total Operating Time = planned production time minus all downtime (planned and unplanned). Only the time the machine was actually running counts.
  • Number of Failures = count of unplanned stops that required intervention to restore normal operation. Planned maintenance stops, changeovers and operator breaks are excluded.
Worked example Value
Planned production time (1 month, 3 shifts) 480 hours
Planned downtime (maintenance, changeovers) 40 hours
Unplanned downtime (all failures combined) 40 hours
Total operating time (480 − 40 planned − 40 unplanned) 400 hours
Number of unplanned failures 8
MTBF (400 / 8) 50 hours
MTTR (40 hours unplanned downtime / 8 failures) 5 hours
Availability (MTBF / (MTBF + MTTR)) = 50 / 55 90.9 %

The critical insight: Availability is a function of MTBF and MTTR together. You can improve Availability by increasing MTBF (failing less often — reliability improvement) or decreasing MTTR (repairing faster — maintenance efficiency). The best plants attack both simultaneously.

What is the difference between MTBF, MTTR and MTTF?

Metric Full name Applies to What it measures Formula
MTBF Mean Time Between Failures Repairable systems (machines, lines) Average running time between two consecutive failures Total operating time / Number of failures
MTTR Mean Time To Repair Repairable systems Average time from failure to restored operation Total repair time / Number of failures
MTTF Mean Time To Failure Non-repairable components (bearings, seals, light bulbs) Average time until the component fails and is replaced Total operating time / Number of units that failed

The distinction matters: MTBF is for machines you repair and put back into service. MTTF is for components you discard and replace. A press has an MTBF. The bearing inside the press has an MTTF. When the bearing fails, the press fails — so the bearing's MTTF directly influences the press's MTBF. Predictive maintenance uses individual component MTTF data to prevent the machine-level MTBF from degrading.

What is the bathtub curve and why does it matter for MTBF?

The bathtub curve describes how failure rate changes over the life of a machine or component. It has three phases:

Phase Failure rate What happens MTBF implication
Infant mortality (early life) High, decreasing Manufacturing defects, installation errors, incorrect settings. A new machine or component fails more often in the first weeks. MTBF is low initially. If you calculate MTBF only during commissioning, you get a misleadingly pessimistic number.
Useful life (constant) Low, stable Random failures at a low, constant rate. This is the normal operating phase. MTBF is at its highest and most stable. This is the phase where MTBF is a meaningful reliability indicator.
Wear-out (end of life) High, increasing Fatigue, corrosion, wear. Components reach the end of their design life. MTBF drops. If you see MTBF declining over months, the machine is entering wear-out phase — and preventive replacement (not more repair) is the correct response.

The practical lesson: monitoring MTBF trends over time — not just the current value — tells you which phase a machine is in. An MES that tracks MTBF per machine over months and years is the only reliable way to detect the transition from useful life to wear-out before it causes cascading failures.

Why is manual MTBF tracking unreliable — and how does an MES fix it?

MTBF requires two inputs: operating time and failure count. Both are trivially simple in theory and chronically inaccurate in practice — because of how they are collected:

  • Operating time: Most plants estimate operating time as "planned production time minus shift breaks." But that ignores micro-stops, waiting times and the grey zone between "running" and "not running." The MES knows the exact machine state at every second — because it reads the PLC signal, not the operator's recollection. At Neoperl, SPS-based alarm capture provided machine-state data at PLC resolution, eliminating the estimation error entirely.
  • Failure count: This is where manual tracking collapses. Operators log major breakdowns. They do not log the 3-minute stop that they fixed themselves. They do not log the alarm that cleared itself. They do not log the micro-stop that happened 12 times per shift but "doesn't count." The MES logs every stop — every alarm, every state change, every duration. At Neoperl, correlating SPS alarms with downtime events revealed that 4 alarm codes caused 80 % of all stops. Those 4 codes were invisible in the manual MTBF calculation because operators classified them as "normal" and did not count them as failures.
  • Definition consistency: What counts as a "failure"? In one shift, operator A counts a hydraulic pressure alarm as a failure. In the next shift, operator B counts it as "normal machine behaviour." The MES applies one consistent rule: every unplanned stop above a defined threshold (e.g., > 2 minutes) counts as a failure. Same definition, every shift, every machine, every plant.

The SYMESTIC alarms module captures every PLC alarm with timestamp, duration and alarm code — the raw data from which MTBF is calculated automatically, per machine, per shift, per week, per month. The production metrics module turns that data into the MTBF trend chart that maintenance managers need: "Machine 5 MTBF dropped from 120 hours to 65 hours over the last 8 weeks — the hydraulic unit is entering wear-out phase."

How does MTBF connect to OEE and maintenance strategy?

Maintenance strategy How it uses MTBF Limitation MES role
Reactive (run to failure) Does not use MTBF — waits until something breaks Maximises unplanned downtime, highest repair cost MES reveals the true cost of reactive maintenance by quantifying MTTR per failure type
Preventive (time-based) Sets maintenance interval at a fraction of MTBF (e.g., maintain every 80 % of MTBF) Over-maintains if MTBF is underestimated; under-maintains if overestimated MES provides accurate MTBF per machine — not manufacturer spec, but actual field data
Condition-based Uses MTBF trend + process parameters to trigger maintenance when degradation is detected Requires sensor data and pattern recognition MES process data module provides temperature, pressure, vibration trends that correlate with MTBF decline
Predictive Uses historical MTBF data + ML models to predict when the next failure will occur Requires 6–12 months of clean historical data MES builds the historical data foundation that predictive models require — automatically, from day one of operation

MTBF feeds directly into OEE Availability: Availability = MTBF / (MTBF + MTTR). If MTBF is 50 hours and MTTR is 5 hours, Availability is 90.9 %. To reach 95 % Availability with the same MTTR, MTBF must increase to 95 hours. The MES calculates both MTBF and MTTR from the same machine-state data — and shows maintenance managers exactly where to focus: "Machine 5 has the lowest MTBF in the plant (50 hours). The top failure cause is alarm #3012 (hydraulic pressure). Fixing the root cause of #3012 would increase MTBF to an estimated 85 hours and Availability from 90.9 % to 94.4 %."

FAQ

What is a good MTBF value?
There is no universal benchmark. MTBF depends entirely on the machine type, age, operating conditions and maintenance regime. A stamping press with MTBF of 200 hours is good; a packaging line with MTBF of 200 hours might be poor — because packaging lines typically have more components and higher expected reliability. The meaningful comparison is: your machine's MTBF this month vs. last month vs. last quarter. Is it improving, stable or declining? That trend is more valuable than any industry benchmark. At Meleghy Automotive, the SYMESTIC MES enabled exactly this comparison — MTBF per press, per plant, tracked over time.

Should I measure MTBF per machine or per production line?
Both — but for different purposes. Machine-level MTBF tells maintenance where to focus repair and replacement. Line-level MTBF tells production planning how often the line will stop. Line MTBF is always lower than individual machine MTBF — because if any machine in the line fails, the line stops. For a line with 5 machines each at 200 hours MTBF, the line MTBF is approximately 40 hours (1/MTBF_line ≈ 1/200 + 1/200 + 1/200 + 1/200 + 1/200). This is why bottleneck machines with the lowest MTBF dominate line performance.

How does MTBF relate to TPM?
TPM (Total Productive Maintenance) uses MTBF as its primary reliability metric. TPM Pillar 3 (Planned Maintenance) sets maintenance intervals based on MTBF data. TPM Pillar 4 (Training & Education) uses MTBF comparisons across operators/shifts to identify skill gaps. Without accurate MTBF data, TPM is theory. With MES-based MTBF tracking, TPM becomes a data-driven system.

Can MTBF be too high?
Yes — if it is artificially inflated. Two common causes: (1) operators do not log short stops, so the failure count is too low and MTBF appears higher than reality; (2) excessive preventive maintenance replaces components before they would ever fail, consuming maintenance budget without improving reliability. The MES eliminates cause (1) by counting every stop automatically. For cause (2), MTBF trending shows whether preventive intervals can be safely extended — saving maintenance cost without increasing failure risk.


Related: MTTR (Mean Time To Repair) · OEE Explained · TPM · Predictive Maintenance · SYMESTIC Alarms Module · SYMESTIC Production Metrics · MES: Definition & Functions

About the author
Martin Brandel
Martin Brandel
MES Consultant at SYMESTIC. Dipl.-Ing. Nachrichtentechnik. Over 30 years in industrial automation — from Simatic S5 PLC programming through to cloud MES machine connectivity. Has connected thousands of brownfield machines to MES systems across automotive, food & beverage and building materials industries. · LinkedIn
Start working with SYMESTIC today to boost your productivity, efficiency, and quality!
Contact us
Symestic Ninja