

# SER Characterization of an Advanced Network Processor using Accelerated Neutron Beam

#### Nelson Tam<sup>1</sup>, ShiJie Wen<sup>2</sup>, Noam Lewis<sup>3</sup>, Richard Wong<sup>2</sup>, Armen Karapetov<sup>4</sup>, Oded Rozenstein<sup>4</sup>, Haim Boot<sup>3</sup>, Reuven Cohen<sup>3</sup>, Usama Nassir<sup>1</sup>

<sup>1</sup>Marvell Semiconductor, Inc., 5488 Marvell Lane, Santa Clara, Ca
<sup>2</sup>Cisco Systems, Inc., 170 West Tasman Drive, San Jose, Ca
<sup>3</sup>Marvell Israel Ltd, 6 Hamada Street, Mordot HaCarmel Industrial Park, Yokneam, Israel
<sup>4</sup>EZchip Technologies Ltd., 1 Hatamar Street, Yokneam, Israel

June 27<sup>th</sup>, 2008

**WDSN 2008** 

MOVING FORWARD FASTER®

#### Agenda

- Motivation
- 98NX3C2 SER Analysis
- ANITA Set Up at TSL
- Memory Test Data
- Application Test Results
  - MTTF
  - ECC Error
  - Parity Error
- Conclusion



### **Motivation**

- SER is now a major reliability requirement in Cisco's component qualifications and for consideration in product/software designs.
- Developing highly Reliable & Available Systems (RAS) requires careful consideration of SEU occurrences and enhancements must be added during the product definition phase.
- Designing 98NX3C2, a wire speed network processor with high RAS capability, posted a significant challenge because of the distributed nature of the embedded memories used in facilitating the high speed packet processing.
- Collaborative effort is formed to establish a methodology to verify the SER of the final design experimentally.



#### 98NX3C2 Features

- 10 Gigabit full-duplex processing
- Classification search engine
- Three 10-G traffic managers
- Twelve 1-G or a single 10-G Ethernet MACs with XAUI interfaces for network link
- Two 10-G Ethernet MACs with XAUI interfaces
- Two 1-G Ethernet & a single PCI Express external host interface
- Internal capabilities for OAM (operations & management) offload
- TSMC 90nm process in HFCBGA package
- ~10 Mbits of embedded SRAM



## 98NX3C2 SER Challenge

- Although high performance general microprocessors have much more memory than 98NX3C2
  - Dual-Core Intel® Itanium® Processor 9000 and 9100 series have up to 24MByte of cache
  - Most SRAM cells are in caches that are ECC or parity protected
- 98NX3C2 has only about 10 Mbits of SRAM
  - Over 140 different arrays
  - Sizes range from 256 bits to 2 Mbits
  - Medium size is 5 kbits
- Too costly to add protection and detection to all the memory arrays
  - ECC and parity protection are judicially added only to the critical arrays until the estimated SER becomes acceptable



## 98NX3C2 SER Estimation

- Only embedded memories are taken into account
  - SRAM FIT rate provided by TSMC
- Architectural derating is applied
- Random logics and analog components are not included



## **Accelerated Neutron Testing**

- Location
  - Atmospheric-like Neutrons from thlck Target (ANITA) facility in The Svedberg Laboratory (TSL) at Uppsala
- Test Objectives
  - Memory testing to verify intrinsic memory SER
    - Memory test algorithm targeting 2 Mbits of internal memories
  - Application tests to characterize MTTF
    - Search Random Tests (SRT) targeting specific interfaces



## 98NX3C2 SER Test Platform



June 27<sup>th</sup>, 2008

## **Experimental Set Up at TSL**

- Test platform is mounted on a frame for better positional control
- Beam acceleration factor is 1.7x10<sup>8</sup>
- Beam diameter is about 1.5cm





#### **Memory Test Flow Chart**



#### **Memory Test Data**

- Each data point is one run where ~50 errors are captured
  - Memory test data in FIT/Mbit is normalized to TSMC SRAM data
- 98NX3C2 memory FIT rate is in good agreement to TSMC FIT rate, which are obtained from LANL





### **Application Mode Test Overviews**

- Three pre-defined search random tests (SRT) which are taken from the functionality test suit
  - Packets are generated internally and routed via loopback with no need for external traffic generator
    - SearchMemoryTcam\_SRT
    - StatisticsMemoryTcam\_SRT
    - WideMemoryTcam\_SRT
- Packet rates are about 6 Gbps or 2.5M packets per sec
  - Each packet accesses different memories at least once and usually much more
  - Memory access is > 2.5M and < 25M access per sec</li>
- Frame memory (2Mb) buffers are 65-70% in use
- Instruction memories (2Mb) are 80% in use (code space)
- Rest of internal memories (5Mb) (TOP's + TM) is 40-50% in use
- Pass/Fail Criteria
  - Packet count
  - Specific events



## SRT Overview (Cont'd)

- Only one SRT is used per test run
- Errors are recorded via a register polling program
  - 100ms sampling time, set to sample once every sec
- Test has init phase and execution phase
  - Duration of execution phase can be changed
  - At the end of execution phase, final check for that SRT is performed
- Based on the acceleration factor and the estimated FIT rate, the MTTF is expected to be about 4 a.u. in accelerated time
  - Used 1, 2, and 5 a.u. exposure times
  - Beam is turned on only during execution phase
  - 5 repetitions per condition were run in semi-random order



## **Application Test MTTF**



- Right censored data is fitted to an exponential distribution using maximum likelihood method.
  - Shortest MTTF is 4.44 a.u. from SearchMemory\_SRT, which is very close to the predicted value from the SER analysis of 98NX3C2.



### **ECC Incident Rate**

- Based on amount of ECC protected memories, 950 errors is expected from the total application test exposure
- Average probability of ECC captured is 14%
- WMT\_SRT has the highest %
- ECC reported from 3 different sources:
  - GENERAL\_CTRL\_IN\_ARB\_ECC\_CNT
  - GENERAL\_RX\_FRAME\_ARB\_ECC\_CN T
  - GENERAL\_TX\_FRAME\_ARB\_ECC\_CNT
- Most common is 2 bits and some 5 bits are reported
  - Mixture of spatial and temporal effects
  - Interleaving effective

|                      |     | % of ECC |
|----------------------|-----|----------|
| Test                 | ECC | Recorded |
| SearchMemoryTcam_SRT | 34  | 11%      |
| StatisticsMemory_SRT | 4   | 1%       |
| WideMemoryTcam_SRT   | 91  | 29%      |





## **Parity Error Incident Rate**

- 22 parity errors were recorded
  - 5 out of 9 parity protected arrays showed errors
- Expected number of parity errors is 272
- Some parity errors are not considered as critical failures because they are recoverable.

|                      | Parity Error | % of Parity Error |
|----------------------|--------------|-------------------|
| Test                 | Count        | Recorded          |
| SearchMemoryTcam_SRT | 4            | 1.5%              |
| StatisticsMemory_SRT | 5            | 1.8%              |
| WideMemoryTcam_SRT   | 13           | 4.8%              |



#### Conclusion

- 98NX3C2 application MTTF was characterized at ANITA with data polling and right censored data collection methodology
- Estimated FIT based on SRAM analysis is in agreement with experimental results
- Memory test program successfully validated SRAM FIT rate
- ECC error log shows that in the worst case, up to 30% of upset bits will be read by the application in the ECC memory arrays under test
- Parity error log shows that in the worst case, 5% of upset bits will be read by the application in the parity protected array under test
- SM\_SRT is the least SER sensitive test and SMT\_SRT is the most sensitive
- WMT\_SRT shows the highest test coverage on the ECC and parity protected arrays.





• Back Up



# **Test system**



#### Memroy Tests – general

- Access to mem in lines, 64 byte each
- Access to total of 2Mbit (256KByte)
- Current and Voltage measurements are manual
- Timestamp accuracy = +/-2 seconds
- Logged:
  - Address location of error
  - Expected data
  - Received data
  - Time stamp
  - Counter of total errors



