SigmaResolve

Common Cause vs Special Cause Variation: Reacting Correctly to Control Chart Signals

Two control charts come across your desk on the same morning. The first shows a single point above the upper control limit on a CMM dimension chart. The second shows seven consecutive points drifting upward—none beyond the limits—on a roundness chart. Both are signals of special cause variation. Neither is a coincidence. And the right response to each is different from the wrong response, which is the response most production floors default to.

This guide walks through how to tell common cause from special cause variation in practice, what the eight Nelson rules actually tell you, and what to do at the moment a signal fires. It is written for quality engineers who already know how to build a control chart and are looking for a sharper decision framework when the chart talks back.

Common Cause vs Special Cause Variation: The Distinction

Common cause variation is what the process does when nothing in particular is happening. Special cause variation is what the process does when something specific has changed.

That distinction is the entire intellectual foundation of SPC. Walter Shewhart’s original 1924 memo at Bell Labs framed it this way: a stable process produces output that varies within a predictable range, and any departure from that range is evidence that the process itself—not just the parts—has changed. The control chart exists to make that evidence visible.

The practical consequence: you investigate special cause and you leave common cause alone. Reacting to common cause variation as if it were special cause is called tampering, and tampering provably increases process variation rather than reducing it. W. Edwards Deming’s funnel experiment is the canonical demonstration.

What “Signal” Actually Means on a Control Chart

A signal is any pattern that would be vanishingly improbable if the process were stable. The most familiar signal is a single point outside ±3σ control limits—under a normal distribution, that happens about 0.27% of the time by chance. But there are seven other patterns that are also improbable enough to count as evidence the process has shifted, drifted, become more or less variable, or stratified.

The Nelson rules (Lloyd Nelson, Journal of Quality Technology, 1984) codify eight such tests. Each test corresponds to a different failure mode the process might be exhibiting. Knowing which test fired tells you a lot about what kind of special cause you’re looking for.

Tip Western Electric rules (1956) are an older, narrower set of four. Most modern SPC software defaults to Nelson, and most automotive customer requirements (PPAP, control plans) accept either. Pick one, document it in your control plan, and apply it consistently. For the full rule definitions see Western Electric and Nelson rules with worked examples.

Mapping Signals to Causes

The eight Nelson tests, what each one suggests, and the cause categories worth investigating first:

Nelson TestPatternWhat It SuggestsInvestigate First
11 point beyond ±3σSudden, large process shift or one-off eventTooling break, fixture slip, material defect, measurement error
29 points same side of CLSustained mean shiftTool wear, raw material lot change, operator change, ambient drift
36 points trending up or downGradual driftTool wear, temperature drift, transducer drift, buildup on fixture
414 alternating up–downMixture or systematic effectTwo operators alternating, two-fixture cycling, even/odd cavity mold
52 of 3 beyond ±2σ same sideEarly warning of mean shiftSame as test 2; act sooner with smaller signal
64 of 5 beyond ±1σ same sideModest sustained shiftTool wear, calibration drift
715 points within ±1σStratification or recalculated limitsSubgroups span multiple streams; control limits set on different process
88 points all beyond ±1σMixture from two distributionsTwo suppliers, two shifts, two cavities producing different distributions

Tests 7 and 8 are the most commonly misread in practice. Test 7 (everything hugging the center line) looks like a great-running process; it usually means your subgrouping is wrong or your control limits were calculated on a wider distribution than the current one. Test 8 (everything in the tails, nothing in the middle) usually means you’re combining two streams that should be charted separately.

Decision Tree at the Moment of a Signal

When a signal fires, work this sequence. Don’t skip steps—the diagnostic value is in the order.

  1. Verify the measurement first. Before assuming the process changed, confirm the data point is real. Re-measure the part. Check the gage. Look for a transcription error in the data file. A surprising number of out-of-control signals are measurement system noise, especially when Gage R&R has not been recently validated. See why Gage R&R must be validated before SPC for the prerequisite work.
  2. Identify which test fired. Don’t treat all signals identically. Test 1 (3-sigma exceedance) implies a different cause set than test 3 (slow drift). The table above is the starting cause-list per test.
  3. Look at concurrent process records. Tool change log, raw material lot receipts, operator change, machine restart, environmental log (temperature, humidity), maintenance event. The signal time-stamp narrows the search window.
  4. Decide containment. If the chart controls a critical-to-quality characteristic and the signal suggests a sustained shift, segregate suspect product produced since the last in-control point. If the signal is on a less critical feature, increase sampling and continue producing while investigating.
  5. Assign root cause. Use the cause-category from the test as a starting hypothesis. 5-Why or fishbone analysis from there.
  6. Document and close. Record the signal, the cause, the corrective action, and whether control limits should be recalculated after the fix. CAPA system entry is non-optional in regulated industries (IATF 16949, AS9100, FDA 21 CFR Part 820).

Worked Example: Reading Two Signals

A milling cell runs an X-bar chart on a 12.000 ± 0.005 in. bore diameter, subgroup of 5, every 30 minutes. Centerline 12.0001 in., UCL 12.0029, LCL 11.9973.

Scenario A: Subgroup 47 mean is 12.0034 in.—above UCL. Tests 1, 5 fire. The subgroup 30 minutes earlier was 12.0012 (well in control). Sudden jump.

  • Re-measure: confirmed 12.0034.
  • Tool change log: insert was indexed at subgroup 46 (right before the signal).
  • Hypothesis: new insert position differs from previous. Containment: hold parts produced since the index. Action: re-zero offset; re-verify with new subgroup.
  • Cause category from table (test 1): tooling change. Confirmed.

Scenario B: Subgroups 31–37 all sit slightly above the centerline (12.0009–12.0021), none beyond UCL. Test 2 (9 points) hasn’t fired yet, but test 6 (4 of 5 above +1σ) just did at subgroup 37.

  • No discrete event in the change log. Smooth drift.
  • Cause category from table (test 6): tool wear or calibration drift.
  • Action: check insert hours since last index. If past expected wear curve, plan an early index. Don’t stop production yet—the process is still in spec, but it’s telling you the next subgroup may not be.

The same control chart, two different signals, two different responses. That’s the value of reading which rule fired, not just that a rule fired.

The Cost of Tampering

The most expensive special-cause investigation is the one that wasn’t a special cause. When operators or engineers adjust a process based on a single point that was within control limits—or worse, based on a point near a specification limit but well within control limits—they’re reacting to common cause variation. Each adjustment adds variance.

The classic indicator: a process that gets “corrected” on every shift, sees its measurement standard deviation grow over a quarter, and eventually loses Cpk. The cure is to publish the control chart on the floor with a clear rule: do not adjust the machine unless a Nelson signal fires. Pair that with operator training so the rule is understood, not just enforced.

Common Mistake Confusing “close to the spec limit” with “out of control”. A point at 12.004 in. on a 12.000 ± 0.005 spec is well inside the spec window, and if it’s also inside the ±3σ control limits, the process has not signaled anything. Adjusting the machine because the part “was getting close” is tampering.

When to Recalculate Control Limits

After a confirmed special cause is identified and corrected, the question becomes whether to recalculate control limits. The rule of thumb: recalculate when the underlying process has genuinely changed (new tooling, new material specification, new machine) and the change is intended to be permanent. Don’t recalculate to make the chart “look better” after a temporary excursion. The original limits, derived from a stable baseline, remain the reference for whether the current process is operating in the same way.

For small sustained shifts that are genuinely improvements (variation reduction after a kaizen event, for instance), CUSUM or EWMA charts often detect the change before Shewhart limits would, and they’re a better instrument when the shift is small enough that recalculating Shewhart limits feels premature.

Where the Tool Helps

The SPC control chart calculator generates X-bar, R, I-MR, p, np, c, and u charts from a CSV upload, applies Western Electric and Nelson rule sets in parallel, and flags which rule each violation triggered. That last detail is the one most Excel-based SPC setups skip—they show points outside limits but not which rule fired, which is the information that drives the decision tree above.

For deeper background on the rule sets themselves, see the AIAG SPC Reference Manual, the de facto industry reference for control chart construction and interpretation in automotive manufacturing. For Lloyd Nelson’s original 1984 paper, see the Journal of Quality Technology archive maintained by ASQ.