Exploiting Error Resilience For Hardware Efficiency - Targeting Iterative and Accumulation Based Algorithms
Due to the COVID-19 crisis measures the PhD defence of Ghayoor Gillani will take place online without the presence of an audience.
The PhD defence can be followed by a live stream.
Ghayoor Gillani is a PhD student in the research group Radio Systems (RS). His supervisor is dr.ir. A.B.J. Kokkeler from the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS).
Computing devices have been constantly challenged by resource-hungry applications such as scientific computing. These applications demand high hardware efficiency and thus pose a challenge to reduce energy/power consumption, latency, and chip-area to process a required task. Therefore, an increase in hardware efficiency is one of the major goals to innovate computing devices. On the other hand, improvements in process technology have played an important role to tackle such challenges by increasing the performance and transistor density of integrated circuits while keeping their power density constant. In the last couple of decades, however, the efficiency gains due to process technology improvements are reaching the fundamental limits of computing. For instance, the power density is not scaling as well as compared to the transistor density. Hence, posing a further challenge to control the power-/thermal-budget of the integrated circuits.
Keeping in view that many applications/algorithms are error-resilient, emerging paradigms like approximate computing come to rescue by offering promising efficiency gains especially in terms of power-efficiency. An application/algorithm can be regarded as error-resilient or error-tolerant when it provides an outcome with a required accuracy while utilizing processing-components that do not always compute accurately. There can be multiple reasons that an algorithm is tolerant of errors, for instance, an algorithm may have noisy or redundant inputs and/or a range of acceptable outcomes. Examples of such applications are machine learning, scientific computing, and search engines.
Approximate computing techniques exploit the intrinsic error tolerance of such applications to optimize the computing systems at software-, architecture- and circuit-level to achieve efficiency gains. However, the state-of-the-art approximate computing methodologies do not sufficiently address the accelerator designs for iterative and accumulation based algorithms. Taking into account a wide range of such algorithms in digital signal processing, this thesis investigates approximation methodologies to achieve high-efficiency accelerator architectures for iterative and accumulation based algorithms.
Error resilience analysis tools assess an algorithm to determine if it is a promising candidate for approximate computing. Statistical approximation (error) models are applied to an algorithm to quantify the intrinsic error resilience and to identify promising approximate computing techniques. In the context of iterative algorithms, we demonstrate that the state-of-the-art statistical model is not effective in revealing opportunities for approximate computing. We propose an adaptive statistical approximation model, which provides a way to quantify the number of iterations that can be processed using an approximate core while complying with the quality constraints.
Targeting energy efficiency, we further propose an accelerator design for iterative algorithms. Our design is based on a heterogeneous architecture, where heterogeneity is introduced by employing a combination of accurate and approximate cores. Our proposed methodology exploits the intrinsic error resilience of an iterative algorithm, wherein a number of initial iterations are run on the approximate core and the rest on the accurate core to achieve a reduction in energy consumption. Our proposed accelerator design does not increase the number of iterations (that are necessary in the conventional accurate counterpart) and provides sufficient precision to converge to an acceptable solution.
The conventional approximate designs follow error-restricted techniques. These techniques restrict the approximations based on the error magnitudes and the error rates they introduce to avoid an unbearable quality loss during processing. On the other hand, however, the error-restricted techniques limit the hardware efficiency benefits that can be exploited within error-resilient applications. In the context of accumulation based algorithms, we propose a Self- Healing (SH) methodology for designing approximate accelerators like square-accumulate (SAC), wherein the approximations are not restricted by error metrics but are provided with an opportunity to cancel out the errors within the processing units. SAC refers to a hardware accelerator that computes an inner product of a vector with itself, wherein the squares of the elements of a vector are accumulated.
We employ the SH methodology, in which the squarer is regarded as an approximation stage and the accumulator as a healing stage. We propose to deploy an approximate squarer mirror pair, such that the error introduced by one approximate squarer mirrors the error introduced by the other, i.e., the errors generated by the approximate squarers are approximately additive inverse of each other. This helps the healing stage (accumulator) to automatically cancel out the error originated in the approximation stage, and thereby to minimize the quality loss. Our quality-efficiency analysis of an approximate SAC shows that the proposed SH methodology provides a more effective trade-off as compared to the conventional error-restricted techniques.
Nonetheless, the proposed SH methodology is limited to parallel implementations with similar modules (or parts of a datapath) in multiples of two to achieve error cancellation. In an effort to overcome the aforesaid shortcoming, we propose a methodology for Internal-Self-Healing (ISH) that allows exploiting self-healing within a computing element internally without requiring a paired, parallel module. We employ the ISH methodology to design an approximate multiply-accumulate (MAC) accelerator, wherein the multiplier is regarded as an approximation stage and the accumulator as a healing stage. We propose to approximate a recursive multiplier in such a way that a near-to-zero average error is achieved for a given input distribution to cancel out the errors at an accurate accumulation stage. Our experiments show that the proposed ISH methodology relieves the multiples of two restriction for computing elements and enables error cancellation within a single computing element.
As a case study of iterative and accumulation based algorithms, we apply our proposed approximate computing methodologies to radio astronomy calibration processing which results in a more effective quality-efficiency trade-off as compared to the state-of-the-art approximate computing methodologies.