health monitoring and life-time prognostics to enable dependable many-processor socs
Yong Zhao is a PhD student in the research group Computer Architecture Design and Test for Embedded Systems. His supervisor is dr.ir. H.G. Kerkhoff from the Faculty of Electrical Engineering, Mathematics & Computer Science.
Nowadays, with the requirement of more powerful data-processing capabilities and the availability of advanced IC technologies, an increased number of complex designs of Many-Processor System-on-Chips (MP-SoCs) have been proposed. They are increasingly applied in life- or mission-critical applications such as automotive, military and aerospace. Hence these SoCs endure much more severe external stress conditions in terms of temperature, shock and radiation as compared to conventional consumer applications. Furthermore, the effort to shrink dimensions of transistors for enabling more complexity has accelerated the wear-out of devices, circuits and associated electronic systems. Hence this has contributed to serious dependability challenges.
In this thesis, based on the aging mechanisms and dependability analysis of our target MP-SoCs, as well as based on our actual application, the mean downtime was required to be close to zero, therefore, a prognostic health-monitoring approach needs to be taken. It typically includes the usage of health monitors (HMs) as well as the life-time prediction software. Based on these, a repair action for a potentially faulty processor core via remapping can be executed, the system can act before the occurrence of a failure, resulting in a full-time available system.
The health monitoring approach proposed in this thesis includes one embedded hardware HM and another software-based HM. The first one can carry out voltage and temperature measurements as well as delay-time monitoring, while the latter includes the critical-path delay monitoring, IDDQ monitoring as well as unit-based IDDT monitoring. The developed software-based HM including implemented hardware as well as the designed software program was implemented within our Xentium-based MP-SoCs. Our accelerated testing experiment was presented with the measurement results of our MP-SoCs regarding the critical-path delay, IDDQ and IDDT. The correlation coefficients between their results were modelled and provided.
Based on the health-monitoring information, the remaining lifetime prediction could be estimated. A genetic-algorithm based optimization model for the critical-path delay result was proposed. In addition, an alternative lifetime-prediction method based on the IDDX monitoring results was developed; it can reach a good accuracy and also reduce the measurement time as compared to the critical-path delay approach.