FailModeLens

Software FMEA Under ISO 26262: Applying Failure Analysis to Safety-Critical Systems

Hardware engineers encounter FMEA on day one. Software engineers often encounter it for the first time when their team starts an ISO 26262 project and someone hands them a spreadsheet with columns for failure modes, effects, and causes—tools designed for cracking shafts and corroding connectors. Translating those columns to firmware requires a different mental model than the one most FMEA training covers. This guide covers the practical steps: what failure modes look like in safety-critical software, how severity maps to ASIL levels, and where FMEA ends and FMEDA begins.

What ISO 26262 Requires (and What It Leaves Open)

ISO 26262-9:2018 specifies inductive analysis methods including FMEA for hardware and software safety analysis. The standard specifies that you must systematically identify failure modes and their safety effects, but it doesn’t mandate a specific FMEA format. Teams spend weeks arguing about whether to use the AIAG-VDA 7-step template or a software-adapted variant—the standard allows either, as long as coverage of the hazardous events from the hazard analysis and risk assessment (HARA) is demonstrable.

The HARA upstream is where ASIL assignments originate. Those assignments flow into the software FMEA to set your severity column. The software FMEA doesn’t determine ASIL—it inherits it from the HARA and uses it to prioritize risk reduction actions.

Why Hardware Failure Modes Don’t Translate to Software

Hardware failure modes describe physical degradation: dimensional drift, corrosion, fracture, dielectric breakdown. Software failure modes describe behavioral deviations from specification. The code either does what it was designed to do or it doesn’t—there’s no wear-in period, no corrosion from humidity, no fatigue crack that grows over time. This distinction matters for how you write failure modes and how you assign occurrence ratings.

Five failure mode categories cover the majority of safety-relevant software failures:

  • Value error: The function executes but produces an incorrect output. Example: a throttle position algorithm that underreports position by 8% because of a scaling constant that doesn’t account for sensor nonlinearity at cold temperatures. The function runs—it just returns the wrong number.
  • Timing failure: The function completes outside its required time window. Example: a collision-avoidance brake request that arrives 40 ms late because a lower-priority task starved the scheduler. For real-time safety functions, latency can matter as much as correctness.
  • Commission error: An action occurs when it should not. Example: a stability control intervention triggered by a transient sensor spike during initialization. The software did something—it just did it at the wrong time.
  • Omission error: A required action does not occur. Example: a fault-handling branch that exits prematurely without setting the required safe state flag, so the downstream function sees a stale value and takes no action.
  • Interface corruption: Data transferred between software components is modified, lost, reordered, or late. CRC failures, buffer overruns, and dropped messages on internal communication buses fall here.

For each failure mode, the question is not “how often does this happen in the field?” It’s “can the triggering condition ever occur in the operational design domain?” A value error caused by a specific input combination has an occurrence rating based on whether that input combination is reachable—not on historical defect rates, which typically don’t exist for new software.

Mapping Severity to ASIL Levels

In hardware FMEA, severity 9–10 flags a safety-critical effect regardless of how rarely the failure occurs. The same logic applies to software. The HARA determines which hazardous events exist and what ASIL each requires. Those ASIL assignments map directionally to severity ratings in the software FMEA:

ASIL-to-Severity Mapping (directional)
FMEA SeverityASIL RangeExample Effect
9–10ASIL C–DLoss of braking or steering function without warning
7–8ASIL B–CDegraded steering response with driver warning available
5–6ASIL A–BReduced comfort function; safety not impaired
1–4QM (no ASIL)Nuisance annunciation, reduced feature performance

This mapping is a starting reference, not a formula. The actual ASIL comes from the HARA’s three-axis assessment of severity, exposure, and controllability. A software failure that contributes to a severity 9 hazard requires the same mandatory action response as a severity 9 hardware failure: risk reduction actions are required regardless of occurrence or detection ratings. For more on how the AIAG-VDA FMEA handbook action priority handles severity 9–10 items, see the post on action priority vs. RPN in AIAG-VDA 2019.

Detection Controls for Software

Detection in hardware FMEA asks whether a current control catches the failure before it reaches the customer. In software, “current controls” take forms that don’t appear on the hardware detection table:

  • Runtime monitors: Watchdog timers, range-limit checks, plausibility monitors, cross-channel comparison (required for ASIL C–D redundant architectures). These operate in the deployed system and catch failures during operation.
  • Static analysis tools: MISRA compliance checkers, coverage analysis, abstract interpretation tools. These find classes of failure before the software ships—they’re pre-deployment controls, which affects how you treat them in the detection column.
  • Fault injection testing: Deliberate introduction of incorrect values or timing conditions to verify that monitors detect and respond as specified. Common for validating ASIL B and above requirements.
  • Hardware monitors: Separate hardware watchdogs, external monitoring ICs, or independent processor cores that observe the primary function. These provide stronger independence than software-only checks on the same execution core.

For ASIL D, ISO 26262 requires hardware-enforced separation between the function and its monitor. A software exception handler running on the same core as the function it monitors provides lower independence than an external hardware watchdog—and that difference should be reflected in your detection rating. A detection mechanism that shares the same single point of failure as the function it monitors doesn’t provide the independence credit that an architectural review might assume.

FMEA vs FMEDA: Where Each Analysis Starts

ISO 26262 programs use both FMEA and FMEDA. They’re distinct analyses, and the distinction matters when you’re planning your safety case documentation.

FMEA (qualitative) identifies systematic failure modes: logic errors, missing requirements coverage, interface mismatches, algorithm edge cases. It answers “what can go wrong and how would we know?” FMEDA (Failure Mode, Effects, and Diagnostic Analysis) quantifies random hardware failures: component failure rates, diagnostic coverage, safe failure fraction. It answers “at what rate will the hardware fail, and how much of that rate does our monitoring catch?”

Your software FMEA and hardware FMEDA should cross-reference each other. Software monitors contribute to the hardware diagnostic coverage metric in FMEDA. But they are separate documents with separate scopes. Tools like APIS IQ-FMEA Pro include modules that support both analyses; Medini Analyze and Capital Functional Safety are purpose-built FMEDA tools. If your team is doing FMEA in the AIAG-VDA format, that document is your systematic failure analysis. Your hardware component reliability engineer owns the FMEDA.

Getting Started: Five Starting Points

For a team beginning their first software FMEA on a safety-critical module:

  1. Start from the software architectural specification, not the code. The FMEA structure follows functional requirements, not implementation details. Functions are described in verb-noun form: “compute brake torque request,” “transmit safety state to gateway.”
  2. Use the five failure mode categories as your brainstorm checklist. For each function, ask whether value error, timing failure, commission, omission, and interface corruption are possible. Most teams miss timing and interface failures when working bottom-up from code.
  3. Link every severity 9–10 item to a HARA hazard. Auditors verify this linkage during functional safety assessments. An unexplained severity 9 entry—one without a corresponding HARA reference—is a documentation gap.
  4. For ASIL B and above, verify that detection controls have architectural independence from the function they monitor. Document the independence argument, not just the existence of the control.
  5. Re-rate occurrence and detection after safety mechanisms are added. The “revised state” ratings reflect the post-mitigation risk. The initial ratings without safety mechanisms represent your baseline risk, which the safety assessor uses to verify that risk reduction was actually achieved.

The same RPN and action priority calculator used for hardware FMEA handles software failure mode scoring. Enter severity, occurrence, and detection ratings, and the tool applies the AIAG-VDA action priority lookup table. For software, the interpretation of what drives each rating changes—but the arithmetic doesn’t.

For the underlying steps of a full FMEA analysis from structure to optimization, the process FMEA step-by-step guide covers the complete 7-step workflow in the AIAG-VDA format, which software teams can adapt using the failure mode categories above.