←

Planned vs. Unplanned Downtime: The Data Model

By Mark Kobbert · Last updated: April 2026

What planned vs. unplanned downtime really means — and why most plants get the data model wrong before they get the classification wrong

The textbook definition is simple. Planned downtime is time when the equipment is deliberately not supposed to run — scheduled maintenance, changeovers, cleaning, validation, breaks where the line intentionally stands, plant holidays. Unplanned downtime is time when the equipment should be running but cannot — technical failures, material shortages, quality holds, missing approvals, operator errors. The textbook then says: planned downtime is excluded from the OEE base, unplanned downtime reduces availability, and a clean taxonomy separates the two. None of that is wrong. None of it is sufficient. The hard problem is not the definitional boundary between the two categories — the hard problem is the data-model layer underneath, where a physical machine state has to become a classified, auditable, query-able event in a database. I have spent the last twelve years architecting that layer across 15,000+ machines in 18 countries, and the pattern is consistent: plants that debate the taxonomy without fixing the data model produce numbers that look reasonable on a dashboard and disintegrate under audit. This article is the engineering view — PackML state models, reason-code ontology, auto-classification rule engines, the State vs. Event distinction — of what makes a downtime-classification system actually work, and why "planned vs. unplanned" is the top layer of a three-layer problem, not the whole of it.

The standards that define the vocabulary — PackML, Weihenstephan, OPC UA, Nakajima

Four references sit underneath every serious downtime-classification conversation. They are not interchangeable, and plants that blend their terminology without realising it produce taxonomies that cannot be cross-benchmarked:

Standard / reference	What it defines	Where it sits in the stack
PackML (ISA-TR88.00.02)	OMAC-developed state model with 17 defined machine states (Stopped, Starting, Idle, Suspended, Execute, Holding, Held, Completing, Completed, Resetting, Clearing, Aborting, Aborted, Stopping, Unsuited, plus Running modes).	Foundational — defines what state the machine is in before any classification applies.
Weihenstephan Standards (WS Pack)	German TU München standard for beverage and packaging line integration — defines data points, KPI calculation methods, downtime categorisation.	Dominant in DACH food & beverage; often mandatory contractually for KHS, Krones line integrations.
OPC UA Companion Specification for Machinery	Modern transport-and-semantic standard for exposing machine state, alarms, downtime reasons over OPC UA.	The integration layer — how the PackML state gets from the PLC to the MES.
Nakajima — Six Big Losses	The canonical OEE loss taxonomy — Breakdowns, Setup/Adjustment, Idling/Minor Stops, Reduced Speed, Defects, Startup Losses.	The conceptual bridge between a classified downtime event and the OEE availability/performance/quality decomposition.

The practical consequence: before any argument about "is this planned or unplanned" can be had productively, the plant has to know which state model it uses (PackML is the default in new deployments; legacy plants often use a vendor-proprietary model), which integration standard the data travels through (OPC UA CS Machinery is the 2026 default; WS Pack is the beverage-industry alternative), and which loss taxonomy the OEE calculation maps to (Nakajima's Six Big Losses is universal, but the boundaries between Idling/Minor Stops and Breakdowns are definitional choices). Most dashboards I have audited blend three standards without declaring any of them — which is why the same "unplanned downtime" metric can vary by 15–25 percentage points across two plants running the same process.

The State-Event Gap — the modelling error underneath almost every classification problem

The pattern I call The State-Event Gap is the single most common data-model error in downtime classification systems. A PackML-compliant PLC publishes a continuous stream of state values — at any given moment the machine is in exactly one state, and that state changes instantly when a transition occurs. But the OEE calculation, the availability metric, the reason-code assignment, the work-order link — all of these operate on events, not states. An event has a start time, an end time, a duration, a reason, and a classification. Turning a state stream into an event stream is not a trivial operation, and plants that hand-wave over this step end up with data models that cannot answer basic questions.

Three common failure modes:

State-at-a-point-in-time dashboards without event persistence. The dashboard shows "machine in Suspended state for 3 minutes" in real-time, but the event is never written to a durable store. When the quality manager asks "how often was line 3 in Suspended last month and for what reasons", there is no answer — only a re-aggregation of raw state samples that loses the reason context.
Event creation without state-boundary discipline. The plant captures events at reason-code entry time, but the event boundaries do not line up with the underlying state transitions. A 10-minute Breakdown event actually contains 2 minutes of Stopping, 6 minutes of Stopped-with-reason, and 2 minutes of Resetting — but the event record shows 10 minutes of "Breakdown", which inflates the category and distorts MTBF calculations downstream.
Reason-code entry long after the state has ended. Operator enters the reason at end of shift for all stops that happened that day. The event durations are correct because they came from the state stream, but the reason assignments are retrospective reconstructions — memory, not measurement.

The architectural answer is what I call The Stoppage Ontology: a three-layer data model where raw PLC state samples are captured continuously, state-transition events are derived from the state stream with explicit boundary rules, and classified stoppage events carry a reason code, a category, a planned/unplanned flag, and a link to the upstream state-transition event. Every classified event is defensibly derived from an observed state sequence; every state transition either generates an event or is consumed by one. Nothing is reconstructed from memory.

The Reason Code Sprawl — why plants with 300+ codes produce worse data than plants with 20

The pattern I call The Reason Code Sprawl is the other end of the same problem. A plant that has been operating a downtime-classification system for five or more years typically has a reason-code catalogue that has grown through accretion rather than design — every new line, every new product, every new engineering manager adds codes, rarely deletes them, never consolidates. I have audited systems with 340, 480, once over 700 active reason codes. The data these systems produce is systematically worse than what a plant with 15–25 well-chosen codes produces, for four reasons:

Operator cognitive load. Scrolling through 340 codes to find the right one at a time-pressured moment produces three outcomes: the first plausible match is selected (under-specific), a generic "Other" code is selected (under-informative), or the entry is skipped entirely (no data at all). None of these is better than a 20-code catalogue forcing a thoughtful selection.
Statistical fragmentation. A failure mode that occurs 50 times per month is useful data — frequency, duration distribution, trend. Split across eight near-duplicate codes, each at 6–7 occurrences, it becomes statistical noise below the reporting threshold of any dashboard.
Cross-plant incomparability. Plant A has "Sensor fault — inductive", plant B has "Sensor failure — optical". Both roll up to the same physical failure class, but no cross-plant report can aggregate them without manual mapping, and the mapping becomes an untracked spreadsheet that rots.
Category drift. With 300+ codes, the boundary between planned and unplanned becomes fuzzy — "Minor cleaning", "Quick wipe", "Cleaning during changeover", "Line hygiene check" end up classified inconsistently across codes, shifts, and operators. This is the Unplanned-Planned Boundary Drift: the structural tendency of large taxonomies to blur the category boundary they were designed to enforce.

The operational sweet spot, validated across the SYMESTIC customer base, is 15–25 active reason codes per production area, organised in 5–8 top-level categories (Technical, Material, Quality, Organisation, Changeover, Cleaning, Planned, Operator). Any plant with more than 40 active codes should treat code reduction as a discrete project; the data-quality improvement from consolidation almost always exceeds any feared loss of granularity.

The Auto-Classification Ladder — why operator entry should be the fallback, not the default

The positive architectural pattern is what I call The Auto-Classification Ladder — a sequence of classification mechanisms, ordered by reliability, where operator entry is the final fallback rather than the primary method:

Rung	Mechanism	Typical coverage
1	Schedule-driven classification. Shift calendar, planned maintenance windows, planned changeover slots pre-classify stoppages that fall inside those windows.	30–45 % of total downtime minutes.
2	Alarm-driven classification. PLC alarm codes mapped to downtime reason codes via a maintained mapping table.	25–40 % of remaining downtime.
3	Upstream/downstream-state inference. Line-internal stoppage classified by whether upstream or downstream machines were blocked/starved at the same moment (material vs. internal cause).	10–15 % of remaining downtime.
4	Duration-driven classification. Stoppages < 5 minutes auto-classified as Minor Stop (Nakajima category 3); longer stops escalated for explicit reason entry.	5–10 % of remaining downtime.
5	Operator entry. Everything else — fallback mechanism for cases the upper rungs cannot resolve.	10–20 % of remaining downtime.

Plants that invert this ladder — where operator entry is the primary mechanism and automation is a "nice to have" — produce classification coverage in the 40–60 % range, with the remaining 40–60 % either unclassified or bulk-assigned to a catch-all code. Plants that implement the ladder top-down end up with 85–95 % coverage, operator entry reserved for the genuinely ambiguous cases, and reason-code data that is statistically dense enough to drive real improvement programmes. The investment is not in more sensors; it is in the mapping tables and the rule engine that connects existing data sources to the event stream.

The Downtime Taxonomy Trap — and the 5-minute Micro-Stop boundary

One architectural choice deserves its own discussion because it distorts OEE more than any other: the duration threshold between a Minor Stop (Nakajima category 3, typically bucketed under Performance loss, not Availability loss) and a classified Downtime event (Availability loss). The industry convention is 5 minutes — below this threshold, stoppages are counted as Idling/Minor Stops and reduce OEE performance; above, they are Breakdowns or other Availability losses. The threshold is arbitrary but near-universal, and the consequence of choosing a different one is that your OEE numbers become incomparable to any public benchmark.

The pattern I call The Downtime Taxonomy Trap is the tendency of plants to unconsciously shift this threshold to improve their OEE number. Setting the threshold at 10 minutes instead of 5 can move 3–6 percentage points of measured loss from Availability into Performance, which makes Availability look better without any physical change. It also makes Performance look worse, but Performance is less commonly reported externally, so the net reputational effect is positive. Plants that have made this shift — usually without documenting it — produce OEE numbers that are literally incomparable to their own history. The discipline: pick 5 minutes, document it, do not change it without a formal review.

The planned/unplanned boundary — a concrete rule set

Stoppage type	Classification	Counts toward OEE Availability?
Scheduled preventive maintenance (calendar-bound)	Planned	No — excluded from the base
Planned changeover	Planned (but tracked; over-target changeovers are improvement opportunities)	No — but changeover-time reduction is a separate OEE-Performance target
Unscheduled breakdown	Unplanned — Availability loss	Yes
Material starvation (no material available)	Unplanned — but organisational, not technical	Yes — tracked separately from Technical
Quality hold (awaiting release)	Unplanned — Organisation	Yes
Operator break (schedule-defined)	Planned if line deliberately stopped; excluded if line autonomously runs	No (planned)
Stoppage < 5 min (Minor Stop)	Performance loss, not Availability loss	No (reduces Performance)
Corrective maintenance deferred to next planned window	Planned — if the deferral is documented before the window opens	No — but the originating failure event is Unplanned

From a mid-market plastics processor in NRW, 2023: The customer was a family-owned injection-moulding operation with four production halls, roughly 190 employees, producing technical parts and components for industrial customers. They had been operating a vendor-proprietary shop-floor system for eleven years and had approached SYMESTIC for a Greenfield-style replacement because the old system's dashboards had become, in the plant manager's words, "confident and wrong." When my team pulled the reason-code catalogue from the legacy system for the migration-scoping workshop, we counted 340 active reason codes across the four halls, organised under twelve top-level categories. An inventory showed that 180 of these codes had been used fewer than ten times in the previous twelve months. Twenty-two codes accounted for 87 % of all classified downtime events. The remaining 320 codes accounted for 13 % of events and roughly 40 % of the cognitive load on operators at the shop-floor terminal. The reported OEE Availability on line 3 was 91.4 % for the previous quarter. We ran a reconciliation exercise over three weeks, migrating the raw state stream into SYMESTIC, applying an Auto-Classification Ladder with five rungs, and comparing the results to the legacy system's reported numbers. The reconciliation showed three distinct gaps. First, the Minor-Stop threshold had drifted. The legacy system had been configured years earlier with a 12-minute Minor-Stop threshold, which was four standard deviations from the industry-standard 5 minutes. Re-baselining to 5 minutes shifted 5.8 percentage points from Minor Stops into Unplanned Downtime. Second, the planned/unplanned boundary had drifted. Corrective maintenance performed outside pre-scheduled windows had been systematically booked as "Planned — deferred CM" regardless of when the originating failure occurred, because the legacy system had no concept of event-lineage. Applying the rule "the deferral must be documented before the window opens" reclassified another 3.1 points from Planned into Unplanned. Third, the reason-code sprawl had produced an Unknown-Catchall drift. The legacy system had a "Miscellaneous Technical" code that had absorbed 11 % of all events, of which about half — on inspection of the state-transition lineage — were actually classifiable material-starvation events that operators had defaulted into the technical bucket because the material-handling codes were buried on page 4 of the terminal UI. Consolidated reason codes with a 22-code top-level catalogue and proper UI grouping moved another 1.6 points. Net reconciliation: reported Availability 91.4 %, honest Availability 76.3 %. A 15.1-point gap that had compounded over eight years of uncritical reporting. Presenting this to the plant manager and the plant controller was the conversation of the year. The plant manager — to his credit — took roughly ten minutes to accept the number and then asked the question that mattered: "If the real number was 76, what did we miss for eight years that we could have fixed?" The answer over the subsequent 18 months: a lot. The 22-code catalogue surfaced three recurring failure modes that had been fragmented across 14 legacy codes and were therefore below the reporting threshold of the old dashboard. Two of them were mechanical issues with specific tooling that were fixed in under three months once they became visible as frequency-dense categories. One was a material-delivery timing pattern that was resolved by shifting a delivery window on the supply side. By month 18, honest Availability had risen from 76.3 % to 84.6 % — an 8.3-point real improvement, larger than the entire OEE-improvement budget had targeted, and larger than could have been achieved under the legacy system at any cost, because the legacy system's data had made the problems invisible. The plant manager now uses the 76.3 number in his staff meetings as "the day we started seeing what was actually happening." That phrase is, in my experience, the correct summary of what a fixed downtime-classification data model does for a plant.

The seven disciplines of a trustworthy downtime-classification system

#	Discipline	Operational test
1	PackML-compliant state stream.	Every machine publishes a defined state at least once per second; transitions are timestamped.
2	State-to-event derivation rules written.	No retrospective event reconstruction; every event traces to a specific state-transition sequence.
3	Reason-code catalogue ≤ 25 active codes per area.	Annual pruning; codes with < 10 uses in 12 months consolidated or retired.
4	Auto-Classification Ladder in place.	Operator entry covers ≤ 20 % of classified minutes; automation covers the rest.
5	Minor-Stop threshold at 5 minutes, documented.	Configuration locked; changes require formal review.
6	Planned/unplanned boundary rules explicit.	Deferred corrective maintenance only counts as Planned if documented before the window opens.
7	Classification coverage monitored as a KPI.	Uncategorised or "Other" codes < 5 % of total downtime minutes.

FAQ

What is planned vs. unplanned downtime?
Planned downtime is time the equipment is deliberately not supposed to run — scheduled maintenance, changeovers, cleaning, breaks. Unplanned downtime is time the equipment should be running but cannot — breakdowns, material starvation, quality holds. Under Nakajima's OEE framework, planned downtime is excluded from the base and unplanned downtime is an Availability loss.

Does planned downtime affect the OEE value?
In the standard OEE calculation, no — planned stoppages are excluded from the base, so they do not reduce OEE directly. But the volume and trend of planned downtime still matter: over-planned maintenance windows or drifting changeover durations consume capacity, and changeover-time reduction programmes target OEE Performance even though they do not affect Availability.

What is The State-Event Gap?
The modelling error of treating a continuous machine-state stream and a discrete classified-event stream as if they were the same thing. State-at-a-point-in-time data cannot answer "how often and for what reason" questions; event data with explicit state-derivation rules can. Plants that do not architect the bridge between the two layers end up with dashboards that cannot be queried historically.

What is The Reason Code Sprawl?
The accretion of reason codes over years of uncritical addition, producing catalogues of 300+ codes that are statistically fragmented, cognitively overloading for operators, and cross-plant incomparable. The operational sweet spot is 15–25 active codes per production area; plants above 40 should treat consolidation as a discrete project.

How detailed should the reason-code catalogue be?
Five to eight top-level categories (Technical, Material, Quality, Organisation, Changeover, Cleaning, Planned, Operator), with 15–25 active codes underneath. For predictive maintenance use-cases where deeper granularity matters, maintain the additional detail within categories rather than at the top level, so cross-plant aggregation remains possible.

What is the 5-minute Minor-Stop threshold?
The industry-standard duration boundary below which a stoppage is classified as a Minor Stop (Nakajima category 3, Performance loss) rather than as a Downtime event (Availability loss). The threshold is arbitrary but near-universal; plants that set it higher to improve apparent Availability produce numbers that are incomparable to any public benchmark and to their own history before the change.

What is The Auto-Classification Ladder?
A sequence of classification mechanisms ordered by reliability — schedule-driven first, then alarm-driven, then state-inference, then duration-driven, with operator entry as fallback — that typically achieves 85–95 % classification coverage with operator entry reserved for the genuinely ambiguous 10–20 %. Plants that invert this ladder (operator-first) achieve 40–60 % coverage with a large unclassified residual.

What standards define the data model?
Four. PackML (ISA-TR88.00.02) defines the machine state model. Weihenstephan Standards WS Pack defines data points and KPI methods for beverage/packaging lines, dominant in DACH. OPC UA Companion Specification for Machinery is the 2026 default transport-and-semantic standard. Nakajima's Six Big Losses is the canonical OEE loss taxonomy. Serious classification conversations declare which standards are in use.

How does the downtime classification relate to the CMMS work-order system?
A well-architected downtime classification event auto-generates a CMMS work order for any Unplanned Technical event above a configured duration threshold, and receives back the work-order closure metadata (root cause, spare parts consumed) once the CMMS order is completed. This bidirectional link — the same handshake pattern that separates honest maintenance KPIs from paper KPIs — makes the downtime data and the maintenance data mutually reinforcing rather than parallel accounts of the same event.

About the author

Mark Kobbert

CTO of SYMESTIC GmbH. Responsible for the cloud-MES architecture since 2014. B.Sc. Business Informatics. Career path: started at SYMESTIC as a software developer directly after university, took over CTO responsibility in 2020. Led the ground-up rebuild of the SYMESTIC platform from on-premise to cloud-native on Microsoft Azure — microservice architecture, API-first, OPC UA and digital I/O gateway integration, real-time data processing across 15,000+ machines in 18 countries. Expertise: cloud-native MES architecture, Microsoft Azure, microservice architecture, OPC UA, MQTT, PackML state modelling, IoT gateway development, edge computing, ISA-95 integration architecture, industrial connectivity, brownfield machine integration, REST APIs, C#/.NET, SQL, Docker/Kubernetes, real-time data processing, IT/OT convergence, downtime-classification data models. Primary architectural conviction: the hard problem in downtime classification is not the taxonomy — it is the data model underneath, and plants that fix the taxonomy without fixing the state-to-event derivation layer produce numbers that disintegrate under audit. · LinkedIn

Start working with SYMESTIC today to boost your productivity, efficiency, and quality!

Planned vs. Unplanned Downtime: The Data Model

What planned vs. unplanned downtime really means — and why most plants get the data model wrong before they get the classification wrong

The standards that define the vocabulary — PackML, Weihenstephan, OPC UA, Nakajima

The State-Event Gap — the modelling error underneath almost every classification problem

The Reason Code Sprawl — why plants with 300+ codes produce worse data than plants with 20

The Auto-Classification Ladder — why operator entry should be the fallback, not the default

The Downtime Taxonomy Trap — and the 5-minute Micro-Stop boundary

The planned/unplanned boundary — a concrete rule set

The seven disciplines of a trustworthy downtime-classification system

FAQ

Other helpful articles

MES Software: Vendors, Features & Costs Compared 2026

OEE Software: Real-Time Dashboards & KPIs with SYMESTIC

MES: Definition, Functions & Benefits 2026