Skip to content

Corrective Maintenance: Why Plants Do More Than They Admit

By Martin Brandel · Last updated: April 2026

Corrective maintenance has a presentation problem and a measurement problem, and the two reinforce each other in most mid-market plants. The presentation problem is that the methodology literature treats corrective maintenance as the strategy that mature organisations are migrating away from — toward preventive, then predictive, then prescriptive — implying that the share of corrective work is something to drive down toward zero. The measurement problem is that almost no plant I have walked through actually measures the share honestly. When you ask the maintenance manager what their corrective-versus-preventive split is, you usually get a confident "about fifty-fifty" or "we're at sixty-forty preventive." When you sit with the work-order log for a week, the real number is almost always far higher on the corrective side. Often a great deal higher.

I have been commissioning machine connections in plants for over thirty years, and the maintenance organisation is one of the things you cannot avoid seeing while you are doing that work. You are in the electrical room at 11 PM with a service technician trying to pull a tag out of a 1995 PLC, and the conversations that happen in that context are different from the conversations that happen in the strategy presentation in the morning. This article is about what those night-shift conversations consistently say, and why the resulting picture is more useful than the official one.

What corrective maintenance actually is

Corrective maintenance is the work that happens after a fault, failure, or out-of-spec condition has occurred, with the goal of restoring the asset to its required functional state. It splits into several sub-categories that the literature names slightly differently but consistently distinguishes: emergency or breakdown maintenance (immediate response to a stoppage with safety or production-blocking consequences), deferred corrective (the fault is real but the asset can keep running until the next planned window), and run-to-failure (a deliberate strategy in which an asset is operated until it fails and then repaired or replaced, because the economics of preventive intervention do not justify the cost). The first two are reactive postures. The third is a strategic choice that gets miscategorised as the first two often enough to muddy almost every internal report.

Around corrective maintenance sits the standard infrastructure: CMMS for work-order routing, MTTR and MTBF as the headline reliability metrics, RCM and FMEA as the analytical frameworks for deciding which assets get which maintenance strategy, and CAPA as the closure discipline that is supposed to convert each significant failure into a learning that prevents recurrence. The infrastructure exists. Whether it is being used to produce honest numbers and honest decisions is a separate question, and the answer is usually less reassuring than the org chart suggests.

The measurement honesty problem

The single most common pattern I see in plants is the gap between the reported maintenance mix and the actual one. There are three reasons it consistently exists.

First, work that should be classified as corrective gets logged as preventive because it happened during a planned-maintenance window. A pump that failed two weeks before its next scheduled service, was kept limping along until the service date, and then got replaced during the window — that is corrective maintenance with deferred timing. It almost always shows up in the CMMS as preventive, because that is the type of the work order it was attached to. The result is that preventive numbers are systematically inflated and corrective numbers systematically deflated.

Second, small unscheduled interventions by operators or front-line technicians — clearing a jam, adjusting a sensor that started drifting, replacing a worn tool earlier than planned — are corrective work in every meaningful sense, but they often never become work orders at all. The operator just fixes the thing and continues. The maintenance log shows none of it. The OEE log shows the downtime, sometimes, depending on how the line is instrumented. The two systems do not reconcile, and the corrective-work fraction in the maintenance system understates reality by whatever the front-line autonomous work amounts to — which in some plants is most of the corrective work that happens in a given week.

Third, there is an organisational incentive to under-report corrective work because the official KPI direction is to reduce it. Departments do not enthusiastically report numbers that make them look like they are heading the wrong way. This is not malicious; it is the same selection bias that affects every metric whose direction-of-travel is itself a reportable target. The fix is not to stop tracking the metric. The fix is to measure the underlying reality through a channel that is not the same channel as the reporting incentive.

Notebook from the maintenance shift
The most reliable way I have found to estimate the actual corrective fraction in a plant is to ignore the CMMS report and look instead at the unscheduled-stop log from the line-side data acquisition for two weeks, then walk that against the work-order log. The difference between the two is roughly the corrective work that is happening below the CMMS's awareness. In most plants that difference is large enough that the official maintenance mix is not a useful number for any decision that depends on knowing it.

When corrective is actually the right strategy

The implicit assumption in the maintenance-maturity-curve literature is that more preventive is always better than more corrective, and predictive is better than both. This is not true at the level of an individual asset. It is true on average across a portfolio, but the average obscures a strategically important truth: for a meaningful subset of assets in any plant, run-to-failure is the genuinely cheapest correct strategy, and forcing them into a preventive regime makes the maintenance budget worse rather than better.

The asset classes where corrective maintenance — specifically, run-to-failure — is the right answer share a small number of characteristics. The asset is not safety-critical, so failure does not endanger people. The asset is not production-critical, either because there is redundancy, because its function is non-blocking, or because the failure can be tolerated until the next scheduled window. The cost of a failure event (the part, the labour, the limited downtime) is materially lower than the cost of running a preventive program over the same time horizon (regular inspection labour, replacement of components that still had useful life, planning overhead). And the failure mode is something the organisation is set up to handle quickly when it does happen — spare parts on hand, a technician who knows the unit, a documented procedure.

Examples I see regularly: small auxiliary motors that drive non-critical conveyance, low-cost sensors whose failure is detected immediately and whose replacement takes minutes, lighting and signalling components, certain classes of pneumatic fittings, some commodity bearings on non-critical rotating equipment. Forcing any of these into a preventive maintenance program produces a calendar of inspections and early replacements that costs more than letting the failure happen and responding to it. The maintenance organisation that has not made this distinction explicitly tends to be running an over-engineered preventive program on assets that did not need one and an under-engineered corrective response on the assets that actually do.

What separates expensive corrective from cheap corrective

For the assets where corrective maintenance is the right strategy — and for the considerable share of corrective work that happens regardless of strategy because plants are not in steady state and reality intervenes — the cost of a corrective event varies enormously between plants in ways that are not about the failure itself. The technical fix is usually similar everywhere. What varies is the surrounding organisational machinery. Plants that run cheap corrective maintenance have, in roughly this order: spare parts physically available at the point of need rather than in a central store three buildings away, work-order routing that reaches the right technician within minutes rather than at the next shift handover, machine data that lets the technician arrive with a hypothesis about the failure rather than starting diagnosis from zero, and a closure discipline that captures the failure mode in enough structured detail that the next occurrence can be diagnosed faster.

None of those are about the maintenance strategy in the abstract. They are about the operational support layer that determines whether a corrective event takes ninety minutes or four hours. In plants where the support layer is weak, the official response is almost always to push harder on preventive maintenance — which is the wrong fix, because the preventive program will not catch the random failures that drive the highest-cost corrective events anyway, and it diverts budget and attention from the support-layer investments that would actually reduce the cost of the corrective work that is going to happen no matter what the strategy says.

Where machine data changes the picture

The single most useful contribution that the data layer makes to corrective maintenance is not predictive analytics on failure modes — that is the headline use case in the literature, and it is genuinely valuable for a subset of high-value assets — but the much more mundane improvement of giving the corrective response itself the context it needs to be fast. When the technician's first action on arrival is "what was the machine doing in the ten minutes before the stop," and that question is answered by pulling up the actual cycle and parameter trace rather than by interviewing the operator, diagnosis time drops in a way that is visible in the MTTR within a few weeks. This is true regardless of whether the underlying maintenance strategy is reactive, preventive, or predictive. Better data makes corrective work cheaper independently of strategy.

The second contribution, less universally appreciated, is honest measurement. When the unscheduled-stop log from the line-side data acquisition is cross-referenced against the CMMS work-order log, the gap between official corrective fraction and actual corrective fraction becomes visible — and once it is visible, the organisation can have an honest conversation about whether to reduce the gap (by formalising the work that is happening informally) or to accept it (by acknowledging that some corrective work will always live in the operator's hands and not in the maintenance system). Either answer is workable. Pretending the gap does not exist is the only answer that is not.

In the SYMESTIC platform, the modules that touch corrective maintenance most directly are Alarms (the actual stoppage signal that triggers the response, with enough context to route to the right technician on arrival), Process Data (the parameter trace from the minutes preceding the failure, which compresses the diagnosis step), and the Maintenance module (work-order routing, closure discipline, structured failure-mode capture). The combination addresses the cheap-corrective problem from the support-layer side. It does not, on its own, address the strategy-fit question — whether a given asset should be on corrective, preventive, or predictive in the first place — because that is a decision that has to be made by the people who know the asset and the cost structure around it. What the platform does provide is the honest measurement that makes the strategy-fit conversation possible: the actual corrective fraction, the actual MTTR by asset class, the actual failure modes recurring across the plant. Once those numbers are real instead of constructed, the rest of the maintenance discussion becomes much shorter.

About the author
Martin Brandel
Martin Brandel
MES Consultant at SYMESTIC. 30+ years in industrial automation — Ing. Büro Albert, Hermos AG, ODEVIS/SYMESTIC. Specialist in machine connectivity, brownfield integration, OPC UA and IoT-Gateway projects. Dipl.-Ing. Communications Engineering. · LinkedIn
Start working with SYMESTIC today to boost your productivity, efficiency, and quality!
Contact us
Symestic Ninja