Extreme-scale computer systems take advantage of large arrays of general-purpose multicore processors coupled with specialized manycore accelerators. In order to support complex applications and correctly feed such processing elements, increasingly larger memory cores are integrated at different levels of the hierarchy. However, the adoption of increasingly aggressive manufacturing processes makes the memory sub-system particularly sensitive to faults. Error correcting codes (ECCs) allow the memory to recover from faults at run-time without interfering with the application execution. However, due to the loss of performance introduced every time an error must be corrected, the persistence of faults requires a more radical repair approach in which faulty cells are physically replaced by spare ones. Memory redundancy analysis (MRA) algorithms are used to drive the allocation process of spare resources. Many one-dimensional and two-dimensional MRAs have been proposed, but tools for evaluating their recovering capability are still not well established.
From a collaboration between TestGroup (Politecnico di Torino), TU Delft, University of Siena and Istituto Superiore Mario Boella (ISMB), we have created SIERRA, a simulation environment for precisely evaluating the repair efficiency of an MRA considering different fault signatures and faulty memory configurations. Our simulation engine provides a precise estimation of the MRA quality by analyzing the behavior of the MRA on several faulty memory configurations. To this end, different parameters such as the area of the memory blocks and the defect density are taken into account. The evaluation of the quality of an MRA takes into account its repairing capability, the power consumption derived from its execution, and the area overhead. Thanks to the use of a database for storing information, our tool is able to speed-up the simulation process by distributing it among several nodes. All these features make SIERRA essential in supporting the design of next-generation high-performance computers.
Enjoy reading information about SIERRA our new published paper:
Alberto Scionti, Somnath Mazumdar, Stefano Di Carlo, Said Hamdioui, SIERRA—Simulation environment for memory redundancy algorithms, Simulation Modelling Practice and Theory, Volume 69, December 2016, Pages 14-30, ISSN 1569-190X,