Implementing System Fault Tolerance by Hw/Sw Codesign

Implementation of new fault tolerance techniques with minimal impact on the final system. In particular, a strong correlation between hardware and software approaches will be investigated to find optimal solutions.

  • Duration: 2003-2004
  • Coordinator: Politecnico di Torino
  • Partners: Politecnico di Torino, University of Stuttgart
  • Funded by: CRUI, DAAD Under PROGRAMMA VIGONI

The growth in space activities and satellite communications market have been geared by the fast evolution of semiconductor technologies, which allow the production of highly sophisticated computer-based environments, on which satellites and ground installations are critically dependent. In space, nuclear and high energy physics applications, digital systems must function in the presence of high radiations. Under the radiations’ effects, the probability of soft errors’ (transient faults) occurrence is not negligible. Recently, the problem of soft errors is also concerning the consumer products community. A review of recent papers and conferences clearly reveals growing concerns in the semiconductor industry.

It is therefore clear that high dependability and availability are two main requirements in critical applications. In this context, “dependability” is defined as the ability of a computer-based system to detect, locate, and if possible correct a fault. Traditional hardware redundancy, or hard-wired techniques, can double the circuit’s size, which, taking yield curves into account, may triple the device’s cost. In addition, performance is significantly impaired by slower system clocks and increased gate count. Therefore, more sophisticated approaches are required to provide a high degree of dependability, along with low overhead in terms of both performance and costs.

The present project aims at implementing new fault tolerance techniques with minimal impact on the final system. In particular, a strong correlation between hardware and software approaches will be investigated to find optimal solutions. The results of the project will be made available to customers wishing to design and apply such solutions in the future. The main outcome of the innovative techniques developed in the project will result in the following:

In the short-term: advanced and commercial technologies will be open for use in space applications (such as telecommunications). Low-cost and high performance SoCs will be usable in space applications, thanks to the dependability achieved using software/hardware fault tolerance solutions.
In the long-term: it will allow producing low-cost commercial products tolerant to soft errors that will affect technologies less than 0.13 µm, and even 100 nm processes that are foreseen to go into production in 2005.
The project relies on research collaboration between the team of Prof. Paolo Prinetto, of the Politecnico di Torino (Torino, Italy), and the team of Prof. Hans-Joachim Wunderlich, of the University of Stuttgart (Stuttgart, Germany).

Both project leaders have a significant background in the research area of Digital Systems’ Dependability. In particular, Prof. Paolo Prinetto and the team from the Politecnico di Torino have a wide experience in the field of fault tolerance of digital systems based on software approaches. On the other hand, Prof. Wunderlich and the team from the University of Stuttgart can lead a main contribution in the field of hardware based system fault tolerance. Therefore, the joint research activity is a possibility to exchange the know-how in the respective fields and to integrate different aspects of a solution for a single problem.

Leave a Reply

Your email address will not be published. Required fields are marked *