MES Software: Vendors, Features & Costs Compared 2026
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
On-Time Delivery (OTD) is the share of customer orders — or order line items — that arrive at the customer by the agreed delivery date, within whatever tolerance window the contract allows. It is the single most important external performance metric a manufacturing plant has, because it is the one the customer actually experiences. Everything else — OEE, schedule adherence, first-pass yield, labour productivity — is internal plumbing. OTD is the number the customer sees on the supplier scorecard, the number the category manager reviews quarterly, and the number that triggers either a contract renewal or a supplier-development escalation.
The formula is straightforward and the arithmetic is not where plants fail. Plants fail at the definitions underneath the arithmetic — which date counts as the "agreed" date, which timestamp counts as "delivered", what tolerance window applies, how partial deliveries are treated. In fifteen years of Tier 1 automotive work and a further decade of introducing MES into plants across seven countries, I have rarely seen two suppliers calculate OTD the same way, and I have never seen a supplier and its customer agree on the calculation without a sit-down to reconcile definitions. Reading the supplier's OTD as if it meant what the word meant is the first trap, and it catches people who should know better.
OTD (%) = orders delivered on time ÷ total orders in period × 100
Three decisions inside this formula drive the entire practical difficulty of the metric:
A supplier reporting 98 % OTD on "confirmed date, goods-issue timestamp, ±2 day window" and a customer calculating 81 % on "original request date, goods-receipt timestamp, same-day" are looking at the same shipment data and reaching legitimate but incompatible conclusions. Both numbers are correct under their own rules. The work is in making the rules explicit and agreed before anyone attaches a target to the number.
Four related metrics, each answering a different question. Competitor glossaries typically cover OTD vs. OTIF and stop there; the full four-way disambiguation is where the real customer-side vocabulary lives.
| Metric | What qualifies as success | Where it is used |
|---|---|---|
| OTD (On-Time Delivery) | Order arrives within the agreed date window. Quantity and quality not considered. | Most common supplier KPI. Default starting metric. |
| OTIF (On-Time In-Full) | On time and complete quantity. A partial shipment on the right date fails. | Retail DC, consumer-goods supply chains, large corporate customers. |
| Perfect Order | On time, in full, correct documents, correct quality, correct packaging — any defect fails. | Pharma, aerospace, high-regulation automotive. The toughest benchmark. |
| Delivery Performance | Weighted composite, often with penalty-day scoring for early and late deliveries. | VDA-style automotive supplier scorecards; contractual KPI. |
The metric the customer actually tracks varies sharply by industry. A Tier 1 automotive supplier to Volkswagen or BMW is measured on Delivery Performance under the VDA methodology, with penalty days for both late and early deliveries — shipping three days early can be as damaging as shipping one day late, because it forces the customer to hold unplanned inventory. A consumer-goods supplier to a retail DC is measured on OTIF, because a truck arriving on time with 85 % of the ordered units forces the DC to schedule a second inbound slot and pay a second handling fee. A pharma contract manufacturer is measured on Perfect Order, because a correct-quantity on-time shipment with a missing Certificate of Analysis cannot be released and ties up the customer's line-clearance slot. Knowing which of the four metrics the customer actually uses is the precondition for any meaningful improvement conversation; optimising OTD while the customer tracks OTIF produces the classic pattern of rising supplier scores and falling customer satisfaction simultaneously.
Order dates in manufacturing are not a single fact. A typical automotive order lifecycle has at least three dates that could credibly be called "the agreed date":
Automotive OEMs consistently measure OTD against the original requested date, because the whole point of the metric is to expose schedule drift including reschedules the supplier negotiated. Automotive Tier 1 suppliers internally measure against the last-agreed revised date, because that is what their production schedules are optimised against. The structural gap between the two views is typically 4 to 12 percentage points — supplier reports 96 %, OEM scorecard shows 87 %, and the quarterly review starts with an argument about measurement rather than a conversation about improvement. The way out is not to pick one definition and hope; it is to report both, with the gap between them visible as a separate metric called "reschedule burden" or equivalent. The gap itself is diagnostic: a supplier with a small reschedule burden has planning discipline, a supplier with a large reschedule burden is stabilising its OTD by continuously renegotiating the target.
The second definition fight in almost every supplier-review meeting: what moment counts as "delivered"? The three candidates and the typical gap between them:
Customers define "delivered" as goods receipt, always, and the contract language usually supports this — Incoterms aside, the customer's purchase order typically requires physical availability at their receiving dock, not at the supplier's shipping dock. Suppliers calculating OTD against goods issue consistently overstate their performance by the transit time gap, and this is the single most common source of the "supplier thinks it's at 97 %, customer scorecard shows 82 %" pattern. The fix is mechanical: the MES and ERP have to track both timestamps, the OTD calculation has to use goods receipt as the primary timestamp, and the goods-issue number (if kept) has to be clearly labelled as "internal shipping performance" rather than as OTD.
Automotive OEM supply is the industry where OTD failure has the most immediately quantifiable cost, and the benchmark levels reflect that. OEM scorecards expect 98 to 99 % on Delivery Performance (the VDA-style composite, not plain OTD). Below 97 % triggers a supplier-development escalation; below 95 % triggers a formal improvement plan and typically a freeze on new business nominations; below 92 % typically triggers a Q-status review and, for non-safety-critical components, a second-source qualification. The consequences are structural, not punitive — an OEM that cannot rely on a Tier 1 cannot run its own assembly line, and the commercial pressure behind the scorecard reflects the underlying operational risk.
The line-stoppage math is the reason the thresholds are so tight. A modern OEM assembly line produces one vehicle every 60 to 90 seconds, at an internal contribution margin of roughly €3,000 to €8,000 per vehicle depending on segment and region. A one-minute line stop costs the OEM €3,000 to €8,000 in missed contribution plus labour holding costs plus downstream resequencing. A thirty-minute stop costs €100,000 to €250,000. A supplier who misses a JIT delivery by two hours, causing a line stop that long, has written off their entire year's margin on that component in a single event, independent of whatever contract penalties apply under the MPA (master purchase agreement). This is the commercial reality that makes automotive OTD so non-negotiable, and it is why 98 % sounds generous to people outside the industry and inadequate to people inside it.
From Johnson Controls, headliner supply to a German premium OEM assembly line, somewhere around 2007: we were running JIS — Just-In-Sequence — which meant each headliner was produced to match a specific vehicle in the OEM's build sequence, delivered in the exact order the vehicles would reach the interior-trim station, and loaded onto a truck that arrived at the OEM's dock every 47 minutes during operating hours. The contract OTD requirement was 99.8 % on goods receipt at the OEM dock, measured against original requested date and time window of ±15 minutes. In absolute terms we were at 99.6 %, which sounded acceptable until you did the annual arithmetic: 0.2 % on roughly 180,000 headliners per year was 360 sequence errors, each one a potential line stop at the OEM, each line stop capable of costing the OEM — and therefore us via the MPA's escalation clauses — €8,000 per minute minimum. In one twelve-month period three sequence errors reached the OEM's line without being caught by the in-sequence verification, and the total invoiced cost back to Johnson Controls was €640,000 plus two weeks of supplier-development team presence on our shopfloor. The instructive part of this, and the part I tell every Tier 1 Sales conversation now, is what actually fixed it. It was not the MES replacement (although that helped), it was not the sequence-verification upgrade (although that was necessary), it was not operator training (although that was done). What fixed it was reconciling the three OTD numbers that had been diverging quietly for months. Our internal OTD against the last-agreed revised date was 99.6 %. Our OTD against the original requested date — the number the OEM saw — was 98.4 %. The customer goods-receipt timestamp, which we were not reliably capturing in our own data, showed a further 0.3 % degradation we could not see because our ship-dock timestamp was what fed our internal dashboard. Three numbers, three definitions, all technically correct, all showing different realities, and the one the OEM used was the one we were not looking at. Once we started reporting the customer-side number as our primary OTD — against original requested date, against goods-receipt timestamp, ±15 minutes, no tolerance-widening during the measurement period — the shift-level conversation changed overnight. The shift lead stopped defending "our number" and started investigating the gap between our number and the OEM's. Within six months the goods-receipt OTD moved from 99.3 % to 99.7 %, the sequence-error rate dropped by two-thirds, the OEM's supplier-development team withdrew, and the next model-year nomination we were competing for was awarded. I would not claim the OTD improvement alone won the nomination, but I am certain that continuing to report the friendlier internal number would have lost it. This is the Tier 1 lesson that I now build every commercial conversation around: whichever OTD definition makes you look best is the one the customer is not using, and the gap between your number and theirs is where the improvement work actually lives.
OTD is the terminal metric of a three-stage causal chain that runs upstream through the plant. Understanding the chain is the difference between an improvement programme that moves OTD and one that generates reports without moving anything.
| Layer | What it measures | Why it drives OTD |
|---|---|---|
| OEE | Machine-level productivity: availability × performance × quality. | Drives throughput capacity. Low OEE means the plan needs buffer or overtime to make dates. |
| Schedule adherence | Plant-level plan execution: right orders, right time, right sequence. | Translates capacity into specific orders completing on time. The direct upstream metric. |
| OTD | Customer-level delivery: right order, right date at goods receipt. | The terminal metric. Driven by schedule adherence plus logistics and finished-goods-buffer discipline. |
The common mistake is trying to improve OTD directly, without touching the upstream layers. It never works, because OTD is an output — you cannot add polish to an output, you can only fix the process that produces it. The correct sequence is almost always OEE first (get the capacity), schedule adherence next (get the right orders at the right time in the plant), OTD last (make sure the finished-goods and logistics handling preserves the gain). Plants that skip the upstream work and try to improve OTD by adding finished-goods buffer — the lazy fix — succeed temporarily at the cost of working capital that grows faster than the OTD improvement. Within two to four quarters the buffer compresses margin enough that management demands it come back down, at which point OTD falls back to where it was. The only durable path is through the upstream metrics, and that is the path an MES-driven scorecard exposes and supports. See also schedule adherence for the direct upstream metric and OEE for the capacity layer.
The anti-pattern every Tier 1 Sales leader has watched a competitor suffer: OTD can be temporarily inflated by systematically shipping orders two to three days earlier than required, before the tolerance window closes. The metric goes up because the on-time-or-early count rises, the customer initially does not complain because "early is fine", and the supplier scorecard looks healthier. Six months later three things have happened: the customer's goods-in is overwhelmed with early shipments they have to inventory, the supplier has built up a habit of ignoring scheduled dates in favour of ship-when-ready, and the VDA-style penalty system that scores early deliveries begins to hit the supplier's Delivery Performance even as their internal OTD still looks good. This is the reason automotive scorecards penalise early deliveries with the same weight as late ones — not because early deliveries are equally bad for the customer (they are not), but because tolerating them creates exactly this gaming pattern, and the pattern destroys JIT discipline over time. The defensible counter-measure is measuring OTD with a narrow early-tolerance window (±1 day at most, often 0 days on the early side in JIT/JIS) and reporting the early-delivery count as a separate metric that management actually escalates. Plants that do this catch the pull-forward behaviour before it establishes itself; plants that don't discover it during a supplier audit.
OTD does not live in a vacuum; it feeds into formal supplier scorecards whose audit hooks every Tier 1 should know by name. Three of the most common in the industries SYMESTIC supports:
The practical implication for MES-driven OTD reporting: the system has to support both the plant-side data capture and the customer-side reconciliation. A report that only shows the plant's view is fine for internal KPIs; for audit and contract defence it has to also reflect how the customer sees the same shipment data. The two views are never identical, and the gap is exactly what the auditor or the category manager wants to see managed.
SYMESTIC captures OTD as a three-timestamp metric — goods issue, ship, and (where ERP integration or customer EDI feedback is available) customer goods receipt — with configurable selection of which timestamp drives the primary calculation and which serve as reconciliation reference. The "which date" question is handled through the bidirectional ERP integration (SAP R/3 via ABAP IDoc, Microsoft Dynamics/Navision, Infor/InforCOM, proAlpha): original requested date, supplier-confirmed date and last-agreed revised date are all captured as separate fields, and OTD can be reported against any of the three or, more usefully, against the one the customer uses with the gap to the supplier-internal view visible as a diagnostic number. For automotive JIT/JIS operations the standard framework is extended with sequence-level verification driven from OEM EDI feeds (DELJIT, VDA-4916, CMI, JIS-Abrufe), in-line alarming before shipment to prevent the sequence errors described in the Johnson Controls amber above, and penalty-day scoring aligned to the VDA Delivery Performance methodology. OTD is displayed on the shopfloor dashboards alongside OEE, schedule adherence, RTY and scrap/rework, because the whole point of the architectural decision is to make the causal chain visible in a single view — the upstream metrics are where the work lives, OTD is the terminal metric that tells the plant whether the work reached the customer.
What is a good OTD value in manufacturing?
Industry-dependent. Automotive OEM direct supply: 98–99 % against Delivery Performance (VDA composite), with 97 % as the escalation threshold and 95 % as the formal-improvement-plan threshold. Retail DC OTIF: 90–95 % with contractual penalty clauses below 85 %. Aerospace and pharma Perfect Order: 97–99 % depending on the specific contract. General industrial supply: 93–96 % is solid, 90 % is market floor. More important than the absolute number is the consistency across quarters and the gap between the supplier-reported number and the customer-scorecard number; a stable 94 % with a one-percentage-point gap to the customer view is healthier than a volatile 97 % with an eight-point gap.
How does OTD differ from OTIF?
OTIF (On-Time In-Full) adds the completeness condition: a shipment that arrives on the agreed date but contains 85 % of the ordered quantity counts as a full failure under OTIF, and as a full success under plain OTD. The structural consequence: OTIF scores are almost always 3–8 percentage points below OTD scores for the same shipment data. Retail customers (Walmart, Tesco, Metro, the European DC-based chains) run on OTIF because partial shipments force rescheduling of inbound capacity. Automotive OEMs usually run on plain OTD with a sequence-level extension for JIT/JIS, because quantity completeness is enforced through the sequence constraint itself.
What is a tolerance window and how should it be set?
The tolerance window defines how many days early or late a shipment can arrive and still count as on-time. Automotive JIT/JIS: typically ±15 minutes on the late side, 0 on the early side. Automotive non-sequenced supply: ±1 day. Retail DC: ±2 days typical, sometimes same-week. Pharma and aerospace: 0 days, exact date only. The window should be whatever the customer contract specifies, and it should be identical to the window the customer uses in their own scorecard — asymmetric tolerance between supplier and customer is the source of the most common OTD-review disputes. Plants that quietly widen their internal tolerance window to improve their reported OTD create exactly the number-gap pattern that IATF audits look for.
Which date should OTD be measured against — original, confirmed, or revised?
Against the date the customer is measuring you on, always. Automotive OEM scorecards use the original requested date; measuring internally against the last-agreed revised date produces a number that looks better than the customer's view and loses credibility when the gap is exposed in a supplier review. Best practice is to capture all three dates (original, confirmed, last-revised), report OTD against the original, and display the reschedule burden (gap between original and last-revised) as a separate diagnostic metric. Plants that do this align their internal view with the customer's view permanently, and eliminate the most common source of supplier-review friction.
Which timestamp should count as "delivered"?
Customer goods receipt, measured through the customer's ERP or their EDI feedback (typically ASN confirmation or equivalent). Goods issue from the supplier's warehouse is the earliest timestamp available and typically 1–3 days ahead of the goods receipt; using it inflates the OTD score by the transit time. Suppliers who report OTD against goods issue consistently show numbers 4–8 percentage points higher than the customer-side view, and the gap is typically the first thing a supplier-development team investigates during an OEM audit. MES platforms with bidirectional ERP integration can capture goods-receipt timestamps automatically where the customer feeds them back; where they cannot, the supplier's closest proxy is the carrier's delivery confirmation.
What does line-stoppage at the customer actually cost?
Automotive OEM assembly line: €3,000 to €8,000 per minute of stoppage depending on segment and region, plus labour holding costs, plus downstream resequencing, plus quality-containment costs if the stoppage caused in-line defects. For a thirty-minute stop the total fully-loaded cost to the OEM is typically €100,000 to €250,000, and the Master Purchase Agreement with the Tier 1 supplier usually allows for recovery of that cost back to the supplier if the stoppage was caused by supplier delivery failure. This is the commercial reality that drives the automotive OTD benchmark of 98–99 %; the cost of a single significant miss can exceed a year's margin on the affected component.
Can I improve OTD without improving OEE or schedule adherence?
Temporarily yes, durably no. The short-term fix is adding finished-goods inventory buffer so that schedule misses don't propagate to delivery misses. This works for one to two quarters, at the cost of working capital that typically grows faster than the OTD improvement. Within a year management demands the inventory come back down, and OTD falls to the level the upstream metrics can sustain. The only durable path to sustained OTD improvement is through OEE (capacity) and schedule adherence (execution) — OTD itself is an output metric, not a lever. This is the reason MES-driven scorecards display all three side by side: the causal chain is where the improvement work lives.
Related: OEE: definition, calculation & practice · MES: definition, functions & benefits · OEE software · MES software compared · Schedule adherence · Rolled Throughput Yield (RTY) · Scrap rate vs. rework rate · Work plan · Recipe management · Change control · Role-based access control · Production planning module · Production control module · Production metrics module · Automotive · Metal processing · Food & beverage · Plastics processing · For COOs & plant managers · For production managers · For operational excellence. External references: IATF 16949 official site (automotive quality standard, Clause 9.1.2 on customer satisfaction and delivery performance) · ASCM/APICS Dictionary (canonical reference for OTD, OTIF and Perfect Order definitions).
MES software compared: vendors, functions per VDI 5600, costs (cloud vs. on-premise) and implementation. Honest market overview 2026.
OEE software captures availability, performance & quality automatically in real time. Vendor comparison, costs & case studies. 30-day free trial.
MES (Manufacturing Execution System): Functions per VDI 5600, architectures, costs and real-world results. With implementation data from 15,000+ machines.