

# Fault Pathologies caused by Moore's Law, and Remedies

2011/07/03

Nobuyasu Kanekawa Hitachi Research Laboratory, Hitachi, Ltd.





#### Contents

- 1. Fault Pathologies caused by Moore's Law
- 2. Intra-Board Redundancy
- 3. On-chip Redundancy
- 4. Conclusions





#### Contents

- 1. Fault Pathologies caused by Moore's Law
- 2. Intra-Board Redundancy
- 3. On-chip Redundancy
- 4. Conclusions



#### Trends in Failure Cause





## Moore's Law





#### Terrestrial Neutron induced SEU's



#### Integration of Semiconductor Causes

→ Decrease Funnel Area (Reduce Error Rate)

Decrease Critical Charge (Increase Error Rate)

→ Increase Error Rate in Overall

•Error Rate: x 1.5-2.0/Generation

•Memory Volume x  $n \rightarrow \#$  of Upset x n

#### Cosmic Neutron Induced SEU Mechanism





#### Synchronization Problem



#### Integration of Semiconductor Causes

Higher Clock Frequency

Consideration for Synchronization among Redundant Subsystems will be Indispensable

- Propagation Delay among Redundant Subsystems vs.
- Maintenance-ability (Replace-ability) of Redundant Subsystems

## Synchronization and Overhead





## Synchronization and Overhead





30<sup>[ns]</sup> Overhead in Checking (7ns/m)10 Allowable Device Delay 33 50 100 200 [MHz] Clock Frequency

Signal Propagation Delay

Delay vs. Frequency

#### Electro-Magnetic Disturbance Sensitivity



#### Integration of Semiconductor Causes

Higher Clock Frequency Lower Power Supply Voltage

- → Larger Noise Intensity
- → More Noise Sensitive

Consideration for Electro-Magnetic Disturbances (EMC, Power Integrity) will be Indispensable

#### Pathologies and Remedies



- Terrestrial Neutron induced SEU's
- Synchronization Problem



Intra-Board Redundancy
On-chip Redundancy



#### Contents

- 1. Fault Pathologies caused by Moore's Law
- 2. Intra-Board Redundancy
- 3. On-chip Redundancy
- 4. Conclusions



## Synchronization and Overhead





## Intra-Board Redundancy





## Immediate/Deferred Reconfiguration

Deferred Reconfig.: Complex Reconfiguration by Software

Simplify Hardware





#### **TPR Architecture**



\*Triple Processor & check Redundancy



## Immediate/Deferred Reconfiguration







**TPR Architecture** 



3500/FT **QPR** Architecture



#### Contents

- 1. Fault Pathologies caused by Moore's Law
- 2. Intra-Board Redundancy
- 3. On-chip Redundancy
- 4. Conclusions



## On-chip Redundancy



Integration of LSI →Needs for SEU Countermeasure

→Seeds for Multi-Core MPU

Electronic Control (Train Automotive)

→Needs for Dpendability



Safety Micro-controller (On-Chip Redudancy)

## Safety Micro-Controller Prototype (FUJINE)







**Process** 

**Hard Macros** 

Random Logic

Chip Size

Operating Frequency

**Power Dissipation** 

Package

0.35 μm 5 Metal CMOS

PLL x 2, RAM(40KB)

740k gates

14.75 mm

60 MHz

2.6W @ 60MHz

479pin BGA

## Safety Micro-Controller

HILIACHI RESEARCH LADORALORY



Small 1996 Frequency Logic(ATC-LSI) Integration Logic Size 1999 Prototype (FUJINE) Soft IP Core 2006 Production Model Large Control Room Safety Control Protocol PM-B(1) CPU-A Fail-Safe Controller CPU-B RAM-B(1) RAME-A(2) **Optical Network System** 

## **Self-Checking Processor**



#### Error Detection by Lock-Step Dual

- Comparator Failure → Mis Detection
- Correlated Error → Mis-Detection



MPU:Micro-Processing Unit SC:Self-Checking, FS:Fail-Safe CMP:Comparator

## Self-checking Comparator





#### **Effect Of Time Diversity**





#### **Effect Of Time Diversity**







#### Contents

- 1. Fault Pathologies caused by Moore's Law
- 2. Intra-Board Redundancy
- 3. On-chip Redundancy
- 4. Conclusions



#### **Conclusions**



#### Integration of Semiconductor along with Moore's Law Causes;

- Terrestrial Neutron induced SEU's
- Synchronization Problem
- Electro-Magnetic Disturbance Sensitivity

#### Prospective Remedies;

- Intra-Board Redundancy
- On-chip Redundancy, and
- Consideration for Electro-Magnetic Disturbances
   (EMC, Power Integrity)

## Reference includes Consideration for Electro-Magnetic Disturbances;

Kanekawa et al., "Dependability in Electronic Systems: Mitigation of Hardware Failures, Soft Errors, and Electro-magnetic Disturbances," Springer (2010) ISBN-13: 978-1441967145

## Hitachi's Expertise in Dependability





#### Hitachi's Approach for Dependability and Safety





**Background Technology** 

Computer (Mainframe to Micro-controller)

Semiconductor

## HITACHI Inspire the Next

"安心と信頼の日立"

in Taiwan