Electronics systems are enabling more autonomous operations in automobiles, factory and home automation. This is putting an increasing number of electronic and electrical systems operating within close proximity to one another. For example, cars have high voltage injectors or massive power inverters for hybrid drives that must operate without interference with wireless receivers. Factories can have large computer controlled machines that coexist with work order systems and material delivery robots. Each of these systems needs to operate safely on its own but also be able to handle unintended interference or faults within the environment to avoid injury or loss of life.
It is a high complexity problem. Functional safety standards such as IEC 61508 for industrial and ISO 26262 for automotive provide a framework for analyzing hazards, developing the safety concept to deal with those hazards, designing the proper hardware and software, and validating system safety performance.
However, compliance places additional burdens all along the development chain, from performing and documenting safety analyses, to managing tradeoffs between redundancy and risk mitigation. Semiconductor companies are now offering devices designed specifically to help designers meet their systems safety goals. Most recently, analog and power devices have joined microcontrollers in offering functional safety features.
What is functional safety?
Functional safety is part of the overall safety of a system or piece of equipment. It requires that the system or equipment operate correctly in response to its inputs, including the safe management of likely operator errors, hardware and software failures as well as environmental changes.
IEC 16508 and ISO 26262 add a systematic and rigorous approach to safety during the development cycle. They include processes and tools for analyzing hazards (the occurrence of an event that put people at risk of danger), predicting failure rates, analyzing faults and devising mitigations to prevent faults from propagating to hazards. They define a safety lifecycle that insures the integrity of the safety development flow from concept, through design and validation, operation and end-of-life.
What makes a system safe?
A system is safe when faults are not allowed to propagate to errors that result in a hazard. All systems are subject to faults – despite the best efforts of careful designers and developers and manufacturing quality control, no system is perfect. However, not all faults are equal. Likelihood of occurrence, potential impact if not detected and dealt with, detectability, and other factors are used to determine the appropriate mitigations to achieve the right amount of fault tolerance. Fault analysis tools are used to classify and categorize faults to determine the appropriate safety mechanisms to be employed (Figure 1.)
Acronym
Tool
Purpose
FMEA
Failure Mode and Effects Analysis
Identify Failure modes and their causes and effects
DFMEA
Design Failure Mode and Effects Analysis
Applications of the FMEA to product design
FMEDA
Failure Modes, Effects, and Diagnostic Analysis
Adds failure rate, functional failure modes, probability of internal detection of faults, and mechanical component usage to the FMEA
FTA
Fault Tree Analysis
Determine the cause of a failure by looking at combinations of lower level events
DFA
Dependent Failure Analysis
Analyzes errors that are dependent on two failures combining to cause the error
Figure 1: Fault analysis tools are used to classify and categorize faults to determine the appropriate safety mechanisms to be employed (Source: Digi-Key)
Fault tolerance can be implemented in hardware, software or both. A common means is functional redundancy to detect or even correct faults. Temporal redundancy, where information is processed more than once and at different times, can be useful against transient faults. Hot or cold backup systems can be used for fail-active systems. Independent checker and diagnostic components can be used to insure the main process is not subject to faulty inputs or generating improper results.
The rise of safety components
The functional safety flow can be implemented at any point in the hierarchy of a system. In fact, if a component or subsystem is designed and developed in such a flow it can ease compliance at the next level. Many embedded electronics systems comprise only a handful of key integrated circuits, so built-in functional safety features and appropriate documentation can speed the path to compliance.
MCUs, the heart of all embedded systems, were the first to address this need. MCUs such as the Aurix family from Infineon and the MPC5643 from NXP offer such functional safety features as lockstep clock delayed cores, voltage and I/O monitoring, built-in self-test, and error-correcting-code (ECC) memories with protection. They are designed with processes certified by third party assessment and provide safety documentation such as safety manuals and failure modes, effects, and diagnostic analysis (FMEDAs.)
Recently, analog and power products have also been aimed at critical functions in high reliability systems. An example is the MC33HB2001 H-Bridge driver developed for automotive and industrial brushed DC motor applications such as electronic throttle, exhaust gas recirculation, turbo flaps and electric pumps. The throttle motor in particular is a case where automotive safety integrity level (ASIL) D (the highest level of automotive safety) is required to prevent unintended acceleration.
Figure 2 shows an application schematic for the MC33HB2001 in a throttle body application. For ASIL D applications, the motor speed and direction is controlled by the IN1 and IN2 pins, and the four-wire serial SPI interface is used to control the flexible features of the device, such as output slew rate, over-current limits, and H-bridge or half-bridge mode of operation, as well as to report diagnostics.
Figure 2: The NXP MC33HB2001, shown here in an ASIL D H-bridge application, is an H-bridge driver developed for automotive and industrial brushed DC motor applications such as electronic throttle, exhaust gas recirculation, turbo flaps, and electric pumps. (Source: NXP)
In H-Bridge mode, IN1 controls the motor direction while the IN2 provides the PWM signal to switch the main power FETs. These bits are mirrored with SPI bits that can be enabled to drive the motor in case IN1 or IN2 become disconnected. This is one of the redundant functional safety features built into the device.
It’s important to note that all automotive systems must be able to handle the reverse battery fault condition in case the battery connection is reversed. In Figure 2 this is accomplished via the N-channel FET connected to Vsys (the switched battery node). If the battery polarity is wrong, the body diode of the FET blocks reverse current from flowing. During normal operation the forward voltage drop is only 0.2 V, which allows startup from the very low battery conditions found in today’s start-stop systems. The NPN transistor in the circuit ensures fast turnoff of the FET when Vsys is switched off.
The 33HB2001 also has numerous built-in fault detection and diagnostic features such as short-circuit detection, vehicle power (Vpwr) overvoltage, Vpwr undervoltage, open load, charge pump undervoltage, overcurrent limit, SPI framing error, thermal warning, and over-temperature shutdown. If one of these faults is detected, an equivalent bit is set in the SPI status register. There is a corresponding mask register that selects which faults can trigger the fault status-monitoring pin (FS_B). Whenever the FS_B pin is driven low, the outputs are also set to high impedance, allowing the device to fail safe, which is a critical safety feature for many use cases.
These and other safety features of the device are elaborated in the safety manual available for download from NXP. The Safety Analysis Report, which summarizes the results of the FMEDA, FTA, DFMEA, and others, is available upon request.
Power and safety
All electronics systems require a stable and reliable source of power. Safety systems also generally require an independent safety monitor for the MCU. Both can be found in the MC33908 power system basis chip (SBC) from NXP (Figure 3.) The power portion of the part consists of a DC/DC buck regulator capable of supplying core voltages of 1.2 V to 3.3 V, at up to 1.5 A. There are also two linear regulators to supply 3.3 V or 5 V for I/O and peripherals, and a 5 V linear supply for CAN communications. All of these regulated outputs are fed from a DC/DC pre-regulator that can be configured for buck or buck/boost operation to support battery voltages from 2.7 V to 40 V. It also has onboard transceivers for CAN and LIN networks.
The safety portion of the device includes voltage monitors, an error monitor, an advanced watchdog, and built-in self-test (BIST).
Figure 3: Interconnection of safety features for the NXP MC33908 system basis chip (SBC) and MPC5643L MCUs. (Source: NXP)
The MPC5643L MCU is a dual, clock delayed lockstep core device with full redundancy of the following critical components:
- CPU core
- DMA controller
- Interrupt controller
- Crossbar bus system
- Memory Protection Unit
- Flash memory and RAM controllers
- Peripheral bridges
- System timers
- Watchdog timer
- Register protection
A redundancy control and checker unit (RCCU) checks each output of the redundant modules to detect errors. The on-board flash and RAM memories have full ECC, while a programmable fault collection and control unit (FCCU) independently monitors the integrity of the device and provides flexible safe state control.
Despite the many safety mechanisms built into the part, the safety case requires additional fault monitoring via an external device, specifically an external power-supply monitor, watchdog timer, and error output monitor.
The fail-safe machine (FSM) in the MC333708 provides these functions. The FSM is as independent as possible from the rest of the device. This includes having its own voltage regulators, bandgap reference, and oscillator, which allows the part to independently monitor the voltages it is generating. If a rail is found to be above the overvoltage threshold or below an undervoltage threshold, the part can be programmed to assert the RSTb pin to trigger a reset of the MCU. It can also assert the FS0b pin to control a fail-safe circuit, which would put safety-critical circuits in a safe state. For example, it would pull the ENBL pin low on the MC33HB2001, which will disable the outputs.
Pet the (watch) dog
A watchdog (WD) external to the MCU is required to detect some common causes of failure such as failure of the MCU internal supplies. The watchdog provided by the MC33908 is more complex than traditional watchdogs in that it provides a higher level of operational check. First the WD must be serviced within a programmable window. Secondly, the refresh is based on a question-and-answer principle, using a pseudo-random number generated by a linear feedback shift register (LFSR) in the MC33908. The LFSR is provided with a seed value by the MCU, which also runs a calculation with the seed to match the resulting LFSR value. If this result is then sent to the MC33908 during an open watchdog window, a correct WD refresh is achieved and the LFSR incremented to generate a new pseudo-random word. An example WD refresh code routine can be found at the end of this article.
To insure that cyclic OK/Not OK behavior converges to a fault detection, a watchdog error counter is used. Every wrong WD refresh increments the WD error counter by 2, every successful WD refresh decrements it by 1. If the counter reaches a programmed max (2, 4 or 6) a reset is generated and RSTb is asserted (Figure 4.)
Figure 4: In the NXP MC33908 watchdog (WD) reset-counter operation, every wrong WD refresh increments the WD error counter by 2, every successful WD refresh decrements it by 1. If the counter reaches a programmed max (2, 4 or 6) a reset is generated and RSTb is asserted. (Source: NXP)
Every reset increments a reset counter by one. A programmable number (2, 3, 5 or 7) of sequential successful WD resets decrements the counter by one. If the counter reaches 3, the FS0b pin is asserted, alerting the MCU that system stability is in question. If the counter reaches 6, the MC33908 turns off all regulators and a new power-up or a low-to-high toggle of the IO_0 pin (key off/on) is required to recover. This is known as the deep fail state. For even more safety critical situations the FS0b pin can be programmed to assert when the counter reaches 1 and the shutoff at 3.
Monitoring the monitor
Safety MCUs have internal monitors to collect errors and assess device safety when failures are detected. This function does not require CPU intervention to operate thus ensuring independence from software issues. In the MPC5643L, this module is called the fault collection and control unit (FCCU). It has two pins, FCCU_F[0:1] that report the errors to the outside world. Connecting these pins to the MC33908 IO [2:3] facilitates this monitoring. When FCCU_F[0:1]=10, a fault condition is indicated and the MC33908 asserts RSTb to reset the MCU and FS0b which can power off the system.
These are a few of the safety features built into the MC33908. You can explore more by requesting the Safety Manual and Safety Application notes from the NXP website. TUV SUD, an independent certification provider, has also assessed the device as fit for ASIL D applications.
MC33908 Watchdog Routine Code example
/*--------------------------------------------------------------------------\
* Disclaimer - This is an example of a MC33908 Watchdog refresh routine.
* Software demo code in this document is provided solely to enable software implementers to
* use the MC33908. NXP makes no warranty, representation, or guarantee regarding
* the suitability of this software for any particular purpose, nor does NXP assume
* any liability arising out of the application using this software, and
* specifically disclaims any and all liability, including without limitation,
* consequential or incidental damages.*
* The term PwSBC below refers to the MC33908
---------------------------------------------------------------------------*\
/*---------------------------------------------------------------------------\
* WD_answer functions
----------------------------------------------------------------------------*\
/***************************************************************************//*!
* @brief The function PwSBC_RefreshWD refreshes the WD.
* @par Include
* PwSBC.h
* @par Description
* This function refreshes WD using the given WD answer. This is
* done by writing into the WD answer register.
* @param[in] answer - 8-bit WD answer to be sent.
* @return
* - "0" - WD was successfuly refreshed. <br>
* - "10" - SPI disconnected or no SPI answer (in decimal). <br>
* - "11" - SPI_G error detected (in decimal).
* @remarks 8-bit answer must be constisted of the control computations
* that have been done on the actual LFSR content.
* @par Code sample
* </para>
* uint32_t status; </para>
* status = PwSBC_RefreshWD(50); </para>
* - Command sends WD answer with value 50.
********************************************************************************/
uint32_t PwSBC_RefreshWD(uint32_t answer){
WD_answer_Tx_32B_tag cmd;
uint32_t resp = 0;
uint8_t error = 0;
cmd.R = 0;
cmd.B.RW = 1; //write command
cmd.B.ADR = WD_ANSWER_ADR; //set address
cmd.B.WD_answer = answer; //set answer
resp = PwSBC_SendCmdRW(cmd.R);
return resp; //returns error status from the previous function
}
/***************************************************************************//*!
* @brief The function PwSBC_GenerateLFSR generates, stores and returns a new state
* of the LFSR from the previous one, that is stored in memory.
* @par Include
* PwSBC.h
* @par Description
* This function evolves LFSR implemented in the MCU into the
* following state and stores the new value in a global
* structure.
* @return New LFSR state.
* @remarks If this function is used, then the synchronization between LFSR
* implemented in the MCU and the one in PwSBC must be guaranteed.
* @par Code sample
* </para>
* uint32_t new_lfsr; </para>
* new_lfsr = PwSBC_GenerateLFSR(); </para>
* - Command evolves actual LFSR content and returns the new one.
********************************************************************************/
uint32_t PwSBC_GenerateLFSR(){
register32_struct gate;
gate.R = 0;
gate.B.bit0 = ~(PITstruct.currentLFSR.B.bit7 ^ PITstruct.currentLFSR.B.bit5);
gate.B.bit0 = ~(PITstruct.currentLFSR.B.bit4 ^ gate.B.bit0);
gate.B.bit0 = ~(PITstruct.currentLFSR.B.bit3 ^ gate.B.bit0);
PITstruct.currentLFSR.R <<= 1;
PITstruct.currentLFSR.B.bit0 = gate.B.bit0;
PITstruct.currentLFSR.R &= 0xFF; //mask out only the lowest Byte
return PITstruct.currentLFSR.R;
}
/***************************************************************************//*!
* @brief The function PwSBC_ComputeLFSR computes, stores and returns test based from
* actual LFSR.
* @par Include
* PwSBC.h
* @par Description
* This function makes control computations with the given LFSR and
* returns result of the computation on the least 8 bits.
* @param[in] actualLFSR - 8-bit LFSR value on which will be applied control
* computations.
* @return Result of the control computations. Computations are made as
* follows:
* ~(((lfsr*4 + 6 - 4)/4
* @remarks Control computations are made in assembler using instructions
* to prove ALU using basic operations (*, /, +, -).
* @par Code sample
* </para>
* uint32_t result; </para>
* result = PwSBC_ComputeLFSR(50); </para>
* - Command makes control computations with the given LFSRvalue (50)
* and returns result.
********************************************************************************/
uint32_t PwSBC_ComputeLFSR(uint32_t actualLFSR){
asm{
#if __option(vle)
se_li r25,0x04 //load nb.4 -> r25
mullw r3,r25,r3 //lfsr * 4 -> r3
e_add16i r3,r3,6 //r3 + 6 -> r3
se_subi r3,4 //r3 - 4 -> r3
se_not r3 //NOT r3 -> r3
e_li r24,0xFFFF //mask -> r24
se_and r3,r24 //r24 & r3 -> r3
divwu r3,r3,r25 //r3 / 4 -> r3 ->as a return value
e_li r24,0xFF //mask -> r24
se_and r3,r24 //store only lower 8 bits -> r3
#else
li r25,0x04 //load nb.4 -> r25
mullw r3,r25,r3 //lfsr * 4 -> r3
addi r3,r3,6 //r3 + 6 -> r3
li r25,4 //load nb.4 -> r25
subf r3,r25,r3 //r3 - r25 -> r3
li r25,0xFFFF //mask for negation -> r25
nand r3,r3,r25 //r25 NAND r3 -> r3
li r25,4 //load nb.r3 -> r25
divwu r3,r3,r25 //r3 / r25 ->r3 -> as a return value
clrlwi r3,r3,24 //store only 8 lower bits
#endif
}
}
Conclusion
The need for safer systems is increasing every day as we move towards self-driving cars, autonomous factories, and even the highly interconnected Internet of Things (IoT). Designers are learning the new methods of insuring safe design and will be requiring more ICs with embedded safety features. While functional safety has historically been associated with MCUs and software, analog and power components are now increasingly a key part of the safety equation.