Skip to content

Process Stability: SPC, Cpk & Real-Time Control 2026

By Christian Fieg · Last updated: April 2026

What is process stability?

Process stability — also called statistical process stability, in-statistical-control, or in Six Sigma terminology simply a controlled process — is the property of a manufacturing process whose output varies only within its natural, inherent variation, with no assignable causes (special causes) acting on it. A stable process is predictable. An unstable process is not — regardless of how tight its specification limits happen to be. In the ISO 22514 / AIAG SPC framework, stability is the prerequisite for every downstream capability claim (Cp, Cpk, Pp, Ppk); without stability, those numbers are arithmetic, not statistics.

I spent three years as a Six Sigma Black Belt at Johnson Controls running DMAIC projects on headliner production lines, then a decade leading global MES and traceability across 900+ machines and 750+ operators in China, Mexico, Tunisia, Macedonia, France and Russia. The single most common mistake I have seen — in plants with Cpk dashboards on the wall — is this: they calculate capability on data from a process that was never stable to begin with. A Cpk of 1.67 from an unstable process is not "good quality." It is a number that describes something that will not exist tomorrow. Process stability is the precondition; capability is the reward.

Stability vs. capability — the distinction that most plants blur

This is the confusion that derails more SPC programmes than any other. Stability and capability are two different questions, measured with two different methods, and a process can fail one while passing the other. Knowing which you have is the difference between a real quality system and a decorative one.

Property Question it answers Measured with
Stability Is the process predictable? Is only common-cause variation acting on it? Control charts (X-bar/R, I-MR, p-chart), run rules (Western Electric, Nelson)
Capability Does the predictable process fit inside the customer's specification? Cp, Cpk (short-term), Pp, Ppk (long-term)
Performance How much of the actual output meets spec over the long run? Ppk, first-pass yield, rolled throughput yield
Stability + capability together Is the process predictable and good? This is the only state worth reporting. Both — in that order, never reversed

The rule that took me years to internalise: stability first, capability second — never the other way round. A process must be demonstrated stable before any Cpk calculation is meaningful. Plants that publish capability indices without proving stability first are reporting fiction with a decimal point.

Common cause vs. special cause — the Shewhart distinction that still matters

Variation type What it is How to address it
Common cause Natural, inherent variation present in every output — the process is doing what it was designed to do Change the process itself (improve tooling, tighten fixtures, upgrade materials). Do not react to individual points.
Special cause External, assignable disturbance — tool wear, operator change, material batch change, temperature excursion Investigate and eliminate the specific cause. Do not change the process itself.
Tampering (Deming's over-adjustment) Reacting to common-cause variation as if it were special cause — adjusting the process every shift Stop. This adds variation. It is the single most common cause of induced instability.

The operational implication is simple and under-practised: most "corrective actions" on the shop floor are tampering. An operator sees one value trending slightly up, nudges a setpoint, and the next sample reads slightly low — so the setpoint gets nudged back. What started as a stable process with 2-sigma natural variation becomes a process with 4-sigma induced variation, driven entirely by well-meaning intervention. A real SPC programme trains operators not to react to anything inside the control limits.

What breaks process stability — the five recurring special causes

Special cause Typical signature on the control chart
Tool wear Slow drift of the mean in one direction (Nelson rule 3 — six increasing or decreasing points)
Material batch change Step-change in mean at a specific timestamp (Nelson rule 1 — point beyond 3σ)
Operator or shift change Pattern that repeats with shift rhythm (rule 4 — 14 alternating points)
Temperature / environmental drift Daily cyclical pattern matching warm-up or ambient changes
Tampering / over-adjustment Increased variation with no corresponding physical change — the chart gets wider but the process didn't

Why manual SPC fails — and what closed-loop changes

Hard-earned lesson from three years running DMAIC projects at Johnson Controls: we had a headliner line with beautiful control charts on clipboards at every station. Cpk 1.45 on paper, quality reports clean, customer audits passing. Then we ran an MSA (measurement system analysis) followed by a real stability study using automated data capture. Three findings killed the "stable" claim in a week. First, operators were sampling every hour on a standard schedule — but the SPC assumption requires rational subgrouping, and the sampling rhythm was missing the actual shift-change variation. Second, when a point came in near a control limit, operators were "re-measuring" — and only recording the second reading. Third, the logged adjustments were being made on common-cause variation, adding induced drift. Real Cpk, measured with automated inline capture, was 0.89. The paper chart was not wrong in arithmetic; it was measuring a version of the process that didn't exist. This is the default outcome of manual SPC at scale, and it is why automated data capture tied to an MES is not a nice-to-have for quality — it is the only way to know whether your process is actually stable.

The digital stack for real process stability

Layer What it delivers
Automated data capture Inline measurement via OPC UA, MQTT or digital I/O gateway — no operator in the measurement loop, no selective re-measurement, no rounding
Real-time control charts Live X-bar/R, I-MR, p-chart computed on every sample; Western Electric and Nelson rules evaluated automatically
Auto-escalation on rule breaks Rule-1 violation → maintenance ticket; rule-3 drift → tool-wear alert; operator never has to interpret the chart
Stability-gated capability Cpk / Ppk only computed and reported when the underlying data passes stability tests — no fiction with a decimal point
Correlation with machine state SPC signals correlated with downtime events, alarms, setup changes, material lots — the special cause becomes visible in minutes, not in next month's report

What this looks like in the SYMESTIC deployment pattern

Inline measurements flow via OPC UA (Rademaker / König packaging at Kamps), MQTT (Carcoustics 500+ machines across seven countries) or digital I/O (Klocke pharma, Weingarten, full site in three weeks without LAN retrofit). Control charts update in the operator's workflow at the shop floor terminal; rule-break events auto-escalate through the alarms subsystem to the role that can actually act. At Neoperl (fully-automated assembly for water-flow products), this pattern delivered 15 % scrap reduction through correlation of PLC alarms with quality defects — the kind of cross-layer analysis that manual SPC cannot produce. Stability moves from "we had a meeting about it last Friday" to "we saw the drift at 09:17 on Tuesday and changed the tool at 09:24."

FAQ

What is process stability?
Process stability is the property of a manufacturing process whose output varies only within its natural, inherent variation, with no assignable (special) causes acting on it. A stable process is statistically predictable: the next output will fall inside the calculated control limits with known probability. It is the foundational concept of Shewhart / SPC and the prerequisite for every capability claim (Cp, Cpk, Pp, Ppk) in ISO 22514 and the AIAG SPC manual.

What is the difference between process stability and process capability?
Stability answers "is the process predictable?" — measured with control charts and run rules. Capability answers "does the predictable process fit inside the customer's specification?" — measured with Cp, Cpk (short-term), Pp, Ppk (long-term). The rule is absolute: stability first, capability second. Calculating Cpk on an unstable process produces a number that describes something that will not exist tomorrow. A process can be stable but not capable (predictable but outside spec), capable but not stable (fits today, will drift tomorrow), both, or neither. Only "both" is worth reporting.

What is the difference between common cause and special cause variation?
Common-cause variation is natural, inherent variation present in every output — the process is doing what it was designed to do. You address it by changing the process itself (tooling, fixtures, materials), never by reacting to individual data points. Special-cause variation is an external, assignable disturbance — tool wear, material batch change, operator change, temperature drift. You address it by investigating and eliminating the specific cause. Reacting to common-cause variation as if it were special cause is called tampering, and it is the single most common way stable processes are made unstable.

How do you prove a process is stable?
Control charts plus run rules. X-bar/R for variables data in subgroups, I-MR for individual measurements, p-chart or u-chart for attribute data. Apply Western Electric or Nelson rules: no points beyond 3σ, no 2-of-3 beyond 2σ, no 6 consecutive increasing or decreasing, no 14 alternating, no 8 on one side of the centreline. If no rules are violated across a sufficient sample (typically 25+ subgroups for short-term studies), the process can be declared stable. If rules fire, the special causes are identified and eliminated before any capability study is run.

What breaks process stability in practice?
Five recurring special causes, each with a distinctive control-chart signature. Tool wear — slow drift of the mean in one direction. Material batch change — step-change in mean at a specific timestamp. Operator or shift change — pattern repeating with shift rhythm. Temperature or environmental drift — daily cyclical pattern. Tampering — increased variation with no corresponding physical change, because someone is adjusting the process on common-cause noise. The fifth is by far the most common in plants that "do SPC" without training operators on when not to react.

Why does manual SPC fail at scale?
Three reasons I watched collapse real Cpk claims at Johnson Controls. Sampling rhythm misses actual process variation — hourly sampling on a schedule doesn't catch shift-change effects. Selective re-measurement — operators re-measure points near control limits and record the second reading, quietly biasing the dataset. Tampering — adjustments logged against common-cause variation add induced drift. Automated inline capture tied to an MES eliminates all three: no human in the measurement loop, no selective recording, rules evaluated automatically. Manual SPC at scale is usually not wrong in arithmetic — it is measuring a version of the process that doesn't exist.

How does an MES support process stability?
Five integrated layers. Automated data capture via OPC UA, MQTT or digital I/O gateway — no operator in the measurement loop. Real-time control charts computed on every sample with Western Electric / Nelson rules evaluated automatically. Auto-escalation on rule breaks to the role that can act (maintenance for tool wear, quality for spec drift, setup for material change). Stability-gated capability — Cpk / Ppk only reported when the underlying data passes stability tests. Correlation with machine state — SPC signals linked to downtime events, alarms, setup changes and material lots, so the special cause is visible in minutes. At Neoperl this combination delivered 15 % scrap reduction through cross-layer correlation alone.

How does SYMESTIC implement process stability?
Inline measurements flow continuously via OPC UA, MQTT or digital I/O (brownfield without LAN retrofit). Control charts update live in the operator's workflow at the shop floor terminal; rule-break events auto-escalate through alarms. Capability indices are computed only on data that passes automated stability tests. SPC signals are correlated with machine downtime, setup changes and material batches in the same data model. 15,000+ machines across 18 countries on this architecture, validated in automotive (Meleghy, Carcoustics), food (Kamps), FMCG (Brita), pharma non-validated (Klocke), metal processing and building products (Neoperl). See SYMESTIC Process Data.


Related: MES · MES Software · OEE · OEE Software · Production Planning Software · Digital Production Control · Shop Floor Terminal · Digital Manufacturing · Manufacturing Analytics · SYMESTIC Process Data · Alarms · Production Metrics

About the author
Christian Fieg
Christian Fieg
Head of Sales at SYMESTIC. 25+ years in manufacturing — maintenance engineer and Six Sigma Black Belt at Johnson Controls, global MES and traceability lead for 900+ machines and 750+ users across China, Mexico, Tunisia, Macedonia, France and Russia, Manager Center of Excellence for the global MES programme at Visteon, Sales Manager MES DACH at iTAC, Senior Sales Manager at Dürr. At SYMESTIC since 2021. Author of "OEE: One Number, Many Lies" (2025). · LinkedIn
Start working with SYMESTIC today to boost your productivity, efficiency, and quality!
Contact us
Symestic Ninja
Deutsch
English