

The 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

Second Workshop on Dependable and Secure Nanocomputing Friday June 27, 2008 — Anchorage, AK, USA

# Fault Tolerance of the Input/Output Ports in Massively Defective Multicore Processor Chips

#### Piotr Zając, Jacques Henri Collet, Jean Arlat, and Yves Crouzet {firstname.lastname}@laas.fr







### From Multi-Cores Architectures To Multi-Multi-Cores Architectures



- Multi-Core: performance while coping with power dissipation issues (very high clock frequency)
- Still, stransitor size for including many of such cores —> significant % of defective cores (more than 10% ?)
- Current context:
  - Chips are sorted according to frequency
  - Single core processor = "Downgraded" dual core circuits ...
- How to go further: On-line reconfiguration to cope with faults?

### **Example Target Architecture**

(5x9-node Network — Connectivity: 4)



#### ■ The I/O Interface (IOP) is a Hardcore and a "Blottle Neck"

## **Preliminary Analysis of Several Options**

Increase the number of I/O ports

Consider redundant I OPs

Extend IOP connectivity with grid (adjacent nodes)

#### **Increasing the Number of IOPs** Example of a 4-IOP Grid Including 14 Defective Cores



# **Redundant IOP Architecture**

#### **Example**:

4-connect RIOP

with R = 3 Redundant I/O Modules (Mi)



#### Chip Validation Criteria? At least *r* out of *R* modules are fault-free at start-up in each *R*IOP

#### Validation probability

$$P_{W,IOP} = \left[\sum_{i=3}^{R} \binom{R}{i} \left(1 - p_{f,M}\right)^{i} p_{f,M}^{R-i}\right]^{N_{IO}}$$

Example: Case of a 4-port Chip for R = 5,6,7and r = 3



## Modification of Grid Topology around each RIOP



Connectivity *nc* = 4



Connectivity *nc* = 6



**RI OP** 

Prob. k/nc nodes adj / RI OP are OK

$$P_{L}(k, n_{C}, p_{f,N}) = \left[\sum_{i=k}^{n_{C}} \binom{n_{C}}{i} (1 - p_{f,N})^{i} p_{f,N}^{n_{C}-i}\right]^{N_{IO}}$$

Example of Overhead Analysis N = 300; N = 4; nc = 8

VC1: To protect communication bandwidth of each *R*I OP, at least 3/8 neighboring nodes must be fault-free.

VC2: Validation yield threshold:  $Pw, IOP \ge PL(k, nc, pf, N) \ge 80\%$ .

$$Q = \frac{\left(R - 1\right) \ N_{IO} \cdot A_{IO}}{N \cdot A}$$





## **Concluding Remarks**

- Study of the protection of the IOPs in multiport grid architectures
- Analysis of the dependability gain and overhead induced: redundancy, connectivity and chip area

- Grid topology and connectivity
- Self-diagnosis and coverage
- Application reconfiguration