In this paper we investigate a novel approximate multiply-and-accumulate (MAC) unit, that computes Y = A×B+C using static segmentation. The proposed architecture uses a unique carry-propagate adder and performs segmentation on the three operands A, B, and C, to reduce hardware cost. The circuit can be configured at design-time by two parameters. The first one controls the segmentation on A and B, while the second one controls the segmentation on C and the adder length. An error compensation technique is also employed, to reduce the approximation error. Error analysis and implementation results in 28nm CMOS for 8-bits multiplier with 20-bits and 24-bits addition are presented. The proposed approximate MACs outperform the state of the art, showing the largest power saving when the mean relative error distance (MRED) is larger than 2×10−3 and 4×10-5 for 20 and 24-bits addition, respectively. For MRED of about 6×10−3 the proposed approximate MAC with 20-bits addition exhibits a power reduction larger than 60% compared to the exact MAC and larger than 27% compared to the state-of-the-art approximate MACs. Application examples to image filtering and template matching show that proposed approximate circuits are good candidates in applications where their error performances are acceptable.
Approximate MAC unit using Static Segmentation / DI MEO, Gennaro; Saggese, G.; Strollo, A. G. M.; De Caro, D.. - In: IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING. - ISSN 2168-6750. - (2023), pp. 1-12. [10.1109/TETC.2023.3315301]
Approximate MAC unit using Static Segmentation
Di Meo Gennaro;Saggese G.;Strollo A. G. M.;De Caro D.
2023
Abstract
In this paper we investigate a novel approximate multiply-and-accumulate (MAC) unit, that computes Y = A×B+C using static segmentation. The proposed architecture uses a unique carry-propagate adder and performs segmentation on the three operands A, B, and C, to reduce hardware cost. The circuit can be configured at design-time by two parameters. The first one controls the segmentation on A and B, while the second one controls the segmentation on C and the adder length. An error compensation technique is also employed, to reduce the approximation error. Error analysis and implementation results in 28nm CMOS for 8-bits multiplier with 20-bits and 24-bits addition are presented. The proposed approximate MACs outperform the state of the art, showing the largest power saving when the mean relative error distance (MRED) is larger than 2×10−3 and 4×10-5 for 20 and 24-bits addition, respectively. For MRED of about 6×10−3 the proposed approximate MAC with 20-bits addition exhibits a power reduction larger than 60% compared to the exact MAC and larger than 27% compared to the state-of-the-art approximate MACs. Application examples to image filtering and template matching show that proposed approximate circuits are good candidates in applications where their error performances are acceptable.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.