←

Process Variation: The Heart of Six Sigma

By Christian Fieg · Last updated: April 2026

What is process variation?

Process variation — sometimes written process variations (plural), also called process variability or, in its strict statistical form, the dispersion of a process output around its target value — is the single most studied phenomenon in quality engineering. It is the unavoidable scatter in the values of any measurable output of a manufacturing process: dimensions, cycle times, weights, temperatures, densities, break strengths, any parameter you can put a number on. Two parts made on the same machine, from the same material, in the same minute, by the same operator will not be identical. The difference between them is variation, and how you understand, decompose and act on that variation is what separates a process that is in control from one that is not — and what separates Six Sigma from manufacturing theatre.

I have spent 25 years with a statistical ruler on the shop floor. Maintenance engineer at Johnson Controls from 1998, then three years as a Six Sigma Black Belt on the headliner line in Rastatt, where I ran DMAIC projects against real variation problems rather than textbook ones. Later, global MES and traceability lead across China, Mexico, Tunisia, Macedonia, France and Russia, where variation analysis was the diagnostic layer underneath every quality programme we ran. Now Head of Sales at SYMESTIC covering 15,000+ connected machines, and author of the 2025 book "OEE: One Number, Many Lies" — a book whose central thesis is that manufacturing metrics, including variation statistics, are the most systematically distorted data sets on the shop floor. The cleanest control chart I ever saw in my career was also the most dishonest. That is not a paradox; it is the field.

The Shewhart–Deming decomposition — the only framework that matters

Serious process-variation analysis starts with Walter Shewhart's 1920s insight, later codified by W. Edwards Deming: variation has two structurally different kinds, and confusing them is the root cause of most failed quality programmes. Common-cause variation is the natural, inherent scatter of a stable process — the sum of many small, random influences that are part of how the process operates. Special-cause variation (also called assignable-cause variation) is variation from a specific, identifiable event — a tool wearing, a material batch changing, an operator substituted, a temperature drifting. The two require opposite responses, and the most expensive mistake in quality engineering is reacting to one as if it were the other.

Type	Source	Correct response	Wrong response
Common cause	Inherent in the process — materials, machine, method, environment	Change the process itself — tighter specifications require system-level redesign	"Tampering" — adjusting parameters after every out-of-target part, which increases variation
Special cause	A specific, identifiable event — tool wear, material change, shift change, parameter drift	Find the assignable cause, eliminate it, restore the process to control	Accepting it as normal — which embeds the special cause in the baseline process

Deming estimated that 94 % of process problems are common-cause and only 6 % are special-cause — numbers he repeated for four decades. My own field experience matches this order of magnitude; in the plants I have personally audited, common-cause variation accounts for roughly 85–95 % of the total, and special-cause variation the remainder. The operational consequence is that most plants spend most of their quality-improvement effort on the smaller problem, chasing special causes through RCA workflows while the larger common-cause problem — which requires capital investment, process redesign, or tighter material sourcing — goes unaddressed. Reversing that allocation is the single highest-leverage change in most quality programmes.

The math — Cp, Cpk, and what they actually measure

Process variation is quantified primarily through two indices: process capability (Cp) and process capability index (Cpk). Both are ratios comparing the specification tolerance to the process spread, and both depend on the assumption that the process is stable — that is, producing only common-cause variation. Running these calculations on an unstable process produces numbers that look authoritative and mean nothing.

Index	Formula	What it measures	Automotive benchmark
Cp	(USL − LSL) / (6σ)	Process potential — how well the spread fits the tolerance, ignoring centring	≥ 1.33 acceptable · ≥ 1.67 target
Cpk	min[(USL − μ), (μ − LSL)] / (3σ)	Actual capability — accounts for both spread and how far off-centre the process runs	≥ 1.33 acceptable · ≥ 1.67 target · ≥ 2.00 Six Sigma-grade

The difference between Cp and Cpk is where the diagnostic value sits. A process with Cp = 2.0 and Cpk = 0.8 has excellent variation control but is running badly off-centre — the countermeasure is to shift the process mean, not to reduce variation. A process with Cp = 0.9 and Cpk = 0.9 is perfectly centred but has too much spread — the countermeasure is variation reduction, potentially requiring capital investment. The two indices look similar on a dashboard; they point at completely different interventions. Plants that only track a single "capability" number without separating these two components are flying blind.

The Six Sigma 1.5-sigma shift — and why it matters. Motorola's original Six Sigma methodology in the 1980s formalised an assumption that the process mean drifts by up to 1.5 standard deviations over time, even in well-controlled processes. This is why "Six Sigma" performance (which should mathematically correspond to 2 parts per billion defects) is commonly quoted as 3.4 parts per million — the 1.5-sigma shift degrades the long-term defect rate from the short-term mathematical ideal. Whether you buy the 1.5-sigma assumption or not (statisticians have argued about it for thirty years), the underlying point is operationally correct: the variation you measure in a short-term capability study is almost always lower than the variation your process actually produces over a quarter. In my Johnson Controls days we accepted this as a rule of thumb: halve the Cpk you measure in a two-week study to estimate what you will actually ship over six months. Plants that don't account for this consistently over-promise on quality to customers and then under-deliver.

Why most SPC programmes miss the real variation

Statistical Process Control — the discipline of using control charts to distinguish common-cause from special-cause variation in real time — is the correct methodology for managing process variation. It has been correct since 1924. The problem is not the methodology; the problem is how SPC gets implemented in practice. In my field experience across automotive, pharma, FMCG and building materials, four specific failure modes produce the same outcome in most plants: an SPC programme that looks compliant, produces clean charts for the audit binder, and misses the real variation that is costing the plant money.

Failure mode	Mechanism	What it does to the data
Sampling-rate undercount	Measuring 5 parts per hour when the process produces 400 — short-duration drift is invisible between samples	Special causes look like common causes because they occur and correct before the next sample
Wrong parameter charted	Charting what's easy to measure rather than what correlates with the defect modes	The chart is statistically valid but analytically useless — it sees variation that does not cause defects
Fixed control limits, drifting process	Control limits set once and never recalculated as the process evolves (tooling age, material change)	Limits become either too tight (nuisance alarms) or too wide (misses real special causes)
Chart-for-audit, not-for-action	Charts exist to satisfy the quality system but nobody acts on the signals	Out-of-control signals are logged, not investigated — the chart becomes decorative

The fourth failure is the most demoralising and the most common. I have walked past control charts in plants where the last five points were in a run pattern (seven consecutive points on one side of the mean — a classic special-cause signal) and nobody had touched them for three shifts. The operator knew the rule; the operator also knew that raising the signal would start a 40-minute paperwork chain and probably not change the process. The process had been producing parts that met specification anyway. So the chart gets marked, the signal gets ignored, and three weeks later when the process finally goes out-of-spec and a customer rejects a shipment, the audit trail shows that the signal was visible all along and nobody acted. This is not an integrity problem; it is a design problem — an SPC system that creates friction between detection and action will be defeated by the same operator rationality that defeats every other badly-designed measurement system.

Real-time SPC vs periodic sampling — the data-foundation shift

The single largest change in process-variation management over the past fifteen years is the shift from periodic sampling (measuring N parts every X minutes, by hand or by gauge) to real-time SPC (charting every cycle automatically from PLC telemetry or in-line sensors). The shift is not incremental — it changes what the control chart can actually see.

Dimension	Periodic sampling	Real-time SPC
Detection latency	Minutes to hours — a drift is visible at the next sample	Per cycle — drift is visible as it develops
Parts at risk per signal	All parts produced between samples	Typically a handful of cycles
Special-cause detection	Catches long-duration causes; misses short drifts that correct before next sample	Catches both long and short — population visibility rather than point visibility
PLC alarm correlation	Manual, retrospective, rarely done	Native — each cycle's value is joined to the alarm state at that cycle
Operator cost	High — sampling is work that competes with production	Near-zero — the sensor is the sampler

The most important row is the second-to-last. Real-time SPC integrated with PLC alarm data allows every variation event to be cross-referenced with the machine state and alarm signature at the moment it occurred. This transforms root-cause analysis from a retrospective detective exercise (looking at a chart a week later and trying to remember what was happening on the line at 14:22 on Tuesday) into a deterministic lookup (this variation excursion correlates with alarm 0x7F3 at cycle 4,218 of the shift, and that alarm also preceded the eleven other excursions in this week's pattern). The Neoperl deployment in the SYMESTIC installed base is the canonical example: once PLC alarms were correlated with process-variation signals, the top five sources of defects became identifiable within a week, and the resulting countermeasures produced 15 % less scrap and 15 % higher productivity. The equipment didn't change. The visibility of variation changed.

The DMAIC workflow applied to variation reduction

Variation reduction is one of the cleanest fits for Six Sigma's DMAIC methodology — Define, Measure, Analyse, Improve, Control — because each phase maps directly onto a specific variation-analysis task. The workflow that actually reduces variation has a consistent shape across industries.

Phase	Variation-specific activity	Deliverable
Define	Scope the variation problem to one output parameter on one product family on one process step	SIPOC, charter, voice-of-customer target
Measure	Measurement System Analysis (MSA / Gage R&R), baseline Cp/Cpk, stability check	Verified variation baseline — not the reported number, the measured number
Analyse	Decompose variation: common vs special, within-part vs between-part, shift vs drift	Identified contributors ranked by variance contribution (ANOVA)
Improve	Design of Experiments (DoE) to identify optimal parameter settings for minimum variance	Process recipe with documented parameter tolerances
Control	Real-time SPC on the now-optimised process, with alarm correlation for sustainment	Control plan, response flowcharts, Cpk sustainment tracking

The discipline that most plants skip is Measurement System Analysis in the Measure phase. Before you quantify process variation, you must quantify the variation of your measurement system — the gauge, the operator, the method. A common rule of thumb: the measurement system should contribute less than 10 % of the total observed variation. If it contributes 30 %, you are measuring mostly your gauge, not your process, and every subsequent analysis is built on noise. Skipping MSA is the single most common reason DMAIC projects fail to reproduce their gains six months after closure — the "improvement" was partly measurement-system artifact, which reverts when the gauge is replaced or the operator rotates.

What this looks like across the SYMESTIC installed base

Across the 15,000+ machines connected to the SYMESTIC platform, the process-variation pattern is consistent. Automated cycle-level capture replaces periodic sampling as the default data foundation — OPC UA for modern controls, digital I/O gateways for brownfield equipment without native interfaces, in-line sensor capture where dimensional or weight data is measured directly in the cycle. Control charts are generated on the real-time stream rather than on end-of-shift paperwork. PLC alarms are automatically correlated with variation excursions, so every out-of-control signal arrives with its probable cause already attached. Cp/Cpk are computed fields rather than quarterly exports into Minitab.

The outcomes from the named customer references show a consistent pattern. Neoperl (building-materials assembly, Müllheim) correlated PLC alarms with process-variation events and defects, landing 15 % less scrap and 15 % higher productivity within months. Meleghy (automotive forming and joining, six plants across Germany, Spain, Czech Republic and Hungary) stabilised its press-shop variation through automated cycle capture and SAP R3 integration, producing 7 % higher output and 10 % fewer stoppages within six months. Carcoustics (automotive moulding and stamping, 500+ machines in Poland and Germany) used the same pattern — MQTT-based IoT integration to Azure plus cycle-level SPC — and saw 8 % availability improvement and 3 % higher output within six months. In every case, the statistical methodology itself (Shewhart, Deming, Motorola Six Sigma) is forty to one hundred years old. What changed was the data foundation underneath it — from periodic sampling to real-time cycle capture — and that change alone unlocked improvements that decades of quality-circle effort had not delivered.

FAQ

What is process variation?
Process variation is the inevitable scatter of any measurable output of a manufacturing process — dimensions, cycle times, weights, densities — around its target value. No two parts made on the same machine from the same material at the same time are identical; the statistical difference between them is variation. The discipline of process-variation management is understanding that scatter, decomposing it into its types, and reducing the part of it that matters for quality. The term is sometimes written "process variations" (plural, as individual events) or "process variability" (more engineering-flavoured); the canonical Six Sigma / SPC term is "process variation" singular, referring to the statistical phenomenon.

What is the difference between common-cause and special-cause variation?
Common-cause variation is the inherent, random scatter of a stable process — the sum of many small influences that are part of how the process operates. It cannot be eliminated without changing the process itself. Special-cause (assignable-cause) variation comes from a specific identifiable event — a tool wearing, a material batch changing, a parameter drifting. It can and should be eliminated by finding and removing the specific cause. Deming estimated 94 % of process problems are common-cause and 6 % special-cause; my field experience confirms the ballpark. The expensive mistake is treating common-cause variation as if it were special-cause (adjusting parameters after every out-of-target part, which actually increases variation — Deming called this "tampering") or treating special-cause as common-cause (accepting it as normal and letting it become part of the baseline).

What is the difference between Cp and Cpk?
Cp measures process potential — the ratio of specification tolerance to process spread, ignoring how well the process is centred on the target. Cpk measures actual capability — the same ratio, but accounting for off-centre running. A process with Cp = 2.0 and Cpk = 0.8 has excellent spread control but runs badly off-centre; the countermeasure is to shift the mean, not to reduce variation. A process with Cp = 0.9 and Cpk = 0.9 is perfectly centred but has too much spread; the countermeasure is variation reduction, often requiring capital investment. The two indices point at different interventions and should never be collapsed into a single "capability" number.

What are the accepted Cpk benchmarks in automotive and similar industries?
In automotive, aerospace, and other precision-engineering industries, Cpk ≥ 1.33 is typically the acceptance threshold for series production (corresponding to approximately 63 parts per million defects at short term). Cpk ≥ 1.67 is the target for critical characteristics (approximately 0.6 ppm). Cpk ≥ 2.00 corresponds to Six Sigma-grade performance (3.4 ppm after accounting for the 1.5-sigma long-term shift). These benchmarks are compliance thresholds, not quality goals; a plant producing at Cpk 1.33 is meeting the minimum, not leading the field. The best-in-class plants I have worked with operate critical characteristics at Cpk 2.5 or higher and use their variation headroom as a competitive moat.

What is the Six Sigma 1.5-sigma shift?
The 1.5-sigma shift is Motorola's 1980s-era assumption that the mean of a manufacturing process will drift by up to 1.5 standard deviations over the long term, even in well-controlled operations. This is why "Six Sigma" performance — which mathematically should correspond to 2 parts per billion defects at short term — is commonly quoted as 3.4 parts per million. The 1.5-sigma shift degrades the long-term defect rate from the short-term mathematical ideal. Statisticians have argued about the exact magnitude for thirty years, but the operational point is correct: the Cpk you measure in a short-term capability study is almost always better than what your process actually produces over a quarter. Halving the short-term Cpk is a reasonable rule of thumb for estimating long-term performance.

Why do most SPC programmes fail in practice?
Not because the statistical methodology is wrong — Shewhart's method from 1924 is still correct — but because implementation fails in four predictable ways. Sampling rate is too low to catch short-duration drifts. The wrong parameters are charted (easy to measure but poorly correlated with defects). Control limits are set once and never recalculated as the process evolves. And most damaging, charts are produced for audit compliance rather than for action — operators know the rule that a run of seven points signals a special cause, but also know that raising the signal starts a 40-minute paperwork chain that rarely changes the process, so the signal gets marked and ignored. The fix is not training; it is system design that makes action cheaper than inaction.

What is Measurement System Analysis (MSA) and why does it matter?
Measurement System Analysis — Gage R&R in its most common form — is the discipline of quantifying the variation contributed by the measurement system itself, separately from the process variation being measured. The rule of thumb is that the measurement system should contribute less than 10 % of the total observed variation; at 30 % you are mostly measuring your gauge, not your process. MSA is the single most-skipped step in DMAIC projects and the single most common reason variation-reduction gains fail to sustain — the "improvement" was partly measurement-system artifact and reverts when the gauge is replaced or the operator rotates. Running capability analysis without first verifying the measurement system is statistically invalid and operationally dangerous.

What is Design of Experiments (DoE) in variation reduction?
Design of Experiments is the systematic approach to identifying which process parameters most influence output variation, by running structured multi-factor trials rather than one-factor-at-a-time testing. A proper DoE can identify the critical 2–3 parameters from a candidate list of 10–15 in a fraction of the runs required by sequential testing. The output is a process recipe — specific parameter values and tolerance windows — that minimises variance on the output characteristic. DoE lives in the Improve phase of DMAIC and is the most statistically sophisticated tool in the Six Sigma toolkit; it is also the tool most often skipped in favour of engineering intuition, which is why Improve-phase gains so often revert.

How does real-time SPC differ from traditional periodic sampling?
Traditional SPC samples N parts every X minutes — 5 parts per hour is typical — and charts those samples. Real-time SPC captures every cycle automatically from PLC telemetry or in-line sensors and charts the population rather than the sample. The detection latency drops from minutes-to-hours to per-cycle; parts-at-risk per signal drops from "everything produced between samples" to "a handful of cycles;" short-duration special causes that correct between samples become visible for the first time. Most importantly, real-time SPC can be natively correlated with PLC alarm data, so every variation excursion arrives with its probable cause already attached. This transforms root-cause analysis from retrospective detective work into deterministic lookup and is the single largest operational change in variation management over the past fifteen years.

How does SYMESTIC handle process variation?
Automated cycle-level capture as the default data foundation — OPC UA for modern controls, digital I/O gateways for brownfield equipment (1–2 hours per machine, no PLC modification), in-line sensor integration where dimensional data is measured directly. Control charts generated on the real-time stream rather than end-of-shift paperwork. PLC alarms natively correlated with variation excursions, so every out-of-control signal arrives with its probable cause attached. Cp and Cpk as computed fields rather than quarterly Minitab exports. Typical outcome on a new connection: previously-invisible short-duration drifts become visible in the first two weeks, the top 3–5 variation contributors are identified in weeks 3–4, and the improvement workflow built on that foundation sustains because the measurement system under it is both automated and verified. See SYMESTIC Production Metrics.

About the author

Christian Fieg

Head of Sales at SYMESTIC. 25+ years in manufacturing — maintenance engineer and Six Sigma Black Belt at Johnson Controls, global MES and traceability lead for 900+ machines and 750+ users across China, Mexico, Tunisia, Macedonia, France and Russia, Manager Center of Excellence for the global MES programme at Visteon, Sales Manager MES DACH at iTAC, Senior Sales Manager at Dürr. At SYMESTIC since 2021. Author of "OEE: One Number, Many Lies" (2025). · LinkedIn

Start working with SYMESTIC today to boost your productivity, efficiency, and quality!

Process Variation: The Heart of Six Sigma

What is process variation?

The Shewhart–Deming decomposition — the only framework that matters

The math — Cp, Cpk, and what they actually measure

Why most SPC programmes miss the real variation

Real-time SPC vs periodic sampling — the data-foundation shift

The DMAIC workflow applied to variation reduction

What this looks like across the SYMESTIC installed base

FAQ

Other helpful articles

MES Software: Vendors, Features & Costs Compared 2026

OEE Software: Real-Time Dashboards & KPIs with SYMESTIC

MES: Definition, Functions & Benefits 2026