An integrated method for implementing online fault detection in NoC-based MPSoCs

The continuing development of the silicon technology leads to systems with hundreds of processors interconnected by a network on chip (NoC-based MPSoCs). On one hand, the nanotechnology enables to develop such complex systems, but, on the other hand, the vulnerability to faults increases. The literature presents partial fault-tolerant approaches, targeting specific parts of the system, as high-level methods, router level, link level, and routing algorithms. There is an important gap in the literature, with an integrated method, from the fault detection at the router level up to the fault recovery and correct execution of applications in a real MPSoC.

This is the goal of the present work, to present a method with fault-tolerant techniques from the physical to the transport layers. The MPSoC is modeled at the RTL level, using VHDL. A fault campaign injection (5 simultaneous injected faults) resulted in 2,000 simulated scenarios. Results demonstrated the effectiveness of the proposal, with most of the scenarios working correctly with routers operating in degraded mode, with an impact on the execution time below 1%.

