DESIGN AND IMPLEMENTATION OF A RECONFIGURABLE 64-BIT VEDIC MULTIPLY-ANDACCUMULATE UNIT WITH DYNAMIC PRECISION SCALING FOR DSP AND AI APPLICATIONS

GOLLOLLU SOMESH1, T MOHAN DAS2

doi:10.62643/

Authors

GOLLOLLU SOMESH1, T MOHAN DAS2 Author

DOI:

https://doi.org/10.62643/

Abstract

The exponential growth of Artificial Intelligence (AI), Deep Learning, and high-throughput Digital Signal Processing (DSP) applications has significantly increased the demand for highperformance and energy-efficient arithmetic hardware. Multiply-andAccumulate (MAC) units represent the fundamental computational building blocks in these systems, accounting for the majority of execution time in convolution, matrix multiplication, and filtering operations. However, conventional fixed-precision 64-bit MAC architectures suffer from underutilization, excessive power consumption, and long carry propagation delays, especially when executing lower-precision workloads such as 16-bit or 8-bit inference tasks. This paper presents the design and implementation of a Reconfigurable 64-bit Vedic Multiply-and-Accumulate (RMAC) unit based on the Urdhva Tiryagbhyam sutra of Vedic mathematics. The proposed architecture employs a hierarchical bottom-up design methodology, constructing the 64×64 multiplier from optimized 2×2 Vedic primitives scaled recursively to 4×4, 8×8, 16×16, and 32×32 blocks. A novel dynamic reconfiguration mechanism utilizing a 2-bit mode-select control and segmented carry-break logic enables the hardware to operate in three distinct precision modes: 1×64-bit, 2×32-bit parallel, and 4×16-bit parallel configurations. The architecture integrates a segmented combinational adder and a 136-bit accumulator with guard bits to prevent overflow during iterative accumulation cycles. By physically isolating carry propagation across predefined bit boundaries, the design ensures complete arithmetic independence in SIMD modes while preserving full-precision operation when required. The RMAC is modeled using synthesizable Verilog HDL and verified through comprehensive simulation across all operational modes. Compared to conventional static Booth-based MAC architectures, the proposed design demonstrates improved energy proportionality, enhanced hardware utilization, and reduced critical path delay due to the parallel nature of Vedic multiplication. The proposed architecture is highly suitable for modern AI accelerators, reconfigurable DSP cores, and energy-aware System-on-Chip (SoC) platforms.