# IMPLEMENTATION AND DESIGN OF MAC UNIT WITH REVERSABLE LOGIC GATES

1. Dr.HENRY 2. Dr.SAI

<sup>1,2</sup>Asst.Proffessors, Dept.of ECE, CJITS, Jangaon, Telangana, India.

<sup>1</sup> damera.ritafaria@gmail.com,

### **ABSTRACT**

The reversible circuits do not lose information and can generate unique outputs from specified inputs and vice versa (there is a one-to-one mapping between inputs and outputs). In order to achieve low power designs Quantum computing and reversible circuits are used. In the majority of digital signal processing (DSP) applications the critical operations are the multiplication and accumulation. Real -time signal processing requires high speed and high throughput Multiplier-Accumulator (MAC) unit that consumes low power, which is always a key to achieve a high performance digital signal processing system. The main aim of the proposed system is to design a MAC unit using reversible logic with least number of gates, number of garbage outputs, delay and quantum cost in order to prove it as an efficient design. Reversible computing is a model of computing where the computational process to some extent is reversible, i.e., time-invertible. A necessary condition for reversibility of a computational model is that the transition function mapping states to their successors at a given later time should be one-to-one. Reversible computing is generally considered an unconventional form of computing. There are two major, closely-related, types of reversibility that are of particular interest for this purpose: physical reversibility and logical reversibility.

Keywords: Reversible logic, Feynman gate, Peres gate, HNG gate, garbage outputs, Quantum cost, Quantum implementation

# LITERATURE SURVEY

The design of high-speed and low-power VLSI architectures need efficient arithmetic processing units, which are optimized for the performance parameters, namely, speed and power consumption. Adders are the key components in general purpose microprocessors and digital signal processors. They also find use in many other functions such as subtraction, multiplication and division. As a result, it is very pertinent that its performance augers well for their speed performance. Furthermore, for the applications such as the RISC processor design, where single cycle execution of instructions is the key measure of performance of the circuits, use of an efficient adder circuit becomes necessary, to realize efficient system performance. Additionally, the area is an essential factor which is to be taken into account in the design of fast adders. Towards this end, high-speed, low power and area efficient addition and multiplication have always been a fundamental requirement of high-performance processors and systems. The major speed limitation of adders arises from the huge carry propagation delay encountered in the conventional adder circuits, such as ripple carry adder and carry save adder.

Power dissipation is one of the most important design objectives in integrated circuit, after speed. Digital signal processing (DSP) circuits whose main building block is a Multiplier-Accumulator (MAC) unit. High speed and low power MAC unit is desirable for any DSP processor. This is because speed and throughput rate are always the concerns of DSP system. Due to rapid growth of portable electronic systems like laptop, calculator, mobile etc., and the low power devices have become very important in today world. Low power and high-throughput circuitry design are playing the challenging role for VLSI designer. For real-time signal processing, a high speed and high throughput MAC unit is always a key to achieve a high performance digital signal processing system. A regular MAC unit consists of multipliers and accumulators that contain the sum of the previous consecutive products. The main motivation of this work is to investigate various multiplier and adder architectures which are suitable for implementing Low power, area efficient and high speed MAC

unit. This chapter begins with the basic building blocks used for addition and multiplication, and it go through different researchers survey on adders, multipliers and MAC unit.

# **REVERSIBLE GATES**

The simplest Reversible gate is NOT gate and is a 1\*1 gate. Controlled NOT (CNOT) gate is an example for a 2\*2 gate. There are many 3\*3 Reversible gates such as F, TG, PG and TR gate. The Quantum Cost of 1\*1 Reversible gates is zero, and Quantum Cost of 2\*2 Reversible gates is one. Any Reversible gate is realized by using 1\*1 NOT gates and 2\*2 Reversible gates, such as V, V+ (V is square root of NOT gate and V+ is its hermitian) and FG gate which is also known as CNOT gate. The V and V+ Quantum gates have the property given in the Equations 1, 2 and 3.

$$V * V = NOT....(1)$$

$$V * V += V + * V = I....(2)$$

$$V+ * V+ = NOT.....(3)$$

The Quantum Cost of a Reversible gate is calculated by counting the number of V, V+ and CNOT gates.

#### **NOT GATE:**

$$A \qquad P = A^1$$

Figure 1.The Reversible 1\*1 gate is NOT Gate with zero Quantum Cost .

### **FEYNMAN/CNOT GATE:**

The Reversible 2\*2 gate with Quantum Cost of one having mapping input (A, B) to output (P = A, Q = AAB) is as shown in the Figure 2. Its Quantum implementation is as shown in Figure 3.

$$\begin{array}{ccc}
A & & & & & & & & \\
B & & & & & & & & \\
B & & & & & & & & \\
\end{array}$$

$$\begin{array}{ccc}
P = A \\
Q = A \oplus B$$

Figure 2. Feynman gate /CNOT gate

# **TOFFOLI GATE:**

The 3\*3 Reversible gate with three inputs and three outputs. The inputs (A, B, C) mapped to the outputs (P=A, Q=B, R=A.BÅ C) is as shown in the Figure 4



Figure 3. Toffoli gate

Figure 4. Quatum implementation of Toffoli gate

ISSN NO: 2249-3034

### PERES GATE:

The three inputs and three outputs i.e., 3\*3 reversible gate having inputs (A, B, C) mapping to outputs (P = A, Q = AÅ B, R = (A.B) Å C). Since it requires 2 V+, 1 V and 1 CNOT gate, it has the Quantum cost of 4. The Peres gate and its Quantum implementation are as shown in the Figure 6 and 7 respectively



Figure 5. Peres gate

Figure 6. Quantum implementation of Peres gate

### **HNG GATE:**

The HNG gate is shown in Fig below, where each output is annotated with the corresponding logic expression. For more information about reversible logic gates see. One of the prominent functionalities of the HNG gate is that it can work singly as a reversible full adder unit.



Figure 7. Reversible HNG gate as a reversible full adder

# IMPLEMENTATION OF MULTIPLY AND ACCUMULATE (MAC) UNIT:

In the majority of digital signal processing (DSP) applications the critical operations usually involve many multiplications and/or accumulations. For real-time signal processing, a high speed and high throughput Multiplier-Accumulator (MAC) is always a key to achieve a high performance digital signal processing system. In the last few years, the main consideration of MAC design is to enhance its speed. This is because, speed and throughput rate is always the concern of digital signal processing system. Pipelined multiplier / accumulator architectures and circuit design techniques which are suitable for implementing high throughput signal processing algorithms and at the same time achieve low power consumption.

A conventional MAC unit consists of (fast multiplier) multiplier and an accumulator that contains the sum of the previous consecutive products. The function of the MAC unit is given by the following equation:

 $F = \Sigma A i Bi$ 



Figure 8: Basic structure of MAC

The main goal of a DSP processor design is to enhance the speed of the MAC unit, and at the same time limit the power consumption. In a pipelined MAC circuit, the delay of pipeline stage is the delay of a 1-bit full adder (Jou, Chen, Yang and Su, 1995). Estimating this delay will assist in identifying the overall delay of the pipelined MAC. In this work, 1-bit full adder is designed. Area, power and delay are calculated for the full adder, based on which the pipelined MAC unit is designed for low power.

# **Multiplier and Accumulator Unit**

MAC is composed of an adder, multiplier and an accumulator. Usually adders implemented are Carry-Select or Carry-Save adders, as speed is of utmost importance in DSP (Chandrakasan, Sheng, & Brodersen, 1992 and Weste & Harris, 3rd Ed). One implementation of the multiplier could be as a parallel array multiplier. The inputs for the MAC are to be fetched from memory location and fed to the multiplier block of the MAC, which will perform multiplication and give the result to adder which will accumulate the result and then will store the result into a memory location. This entire process is to be achieved in a single clock cycle (Weste & Harris, 3rd Ed). Figure 12 is the architecture of the MAC unit which had been designed in this work. The design consists of one 9 bit register, one 4-bit. The product of Ai X Bi is always fed back into the 9-bit Ripple Carry accumulator and then added again with the next product Ai x Bi. This MAC unit is capable of multiplying and adding with previous product consecutively up to as many as eight times.

Operation: Output =  $\Sigma$  Ai Bi

In this paper, the design of 4x4 MAC unit is carried out that can perform accumulation on 8 bit number. This MAC unit has 9 bit output and its operation is to add repeatedly the multiplication results. The total design area is also being inspected by observing the total count of transistors. Power delay product is calculated by multiplying the power consumption result with the time delay

# **Multiplication Concepts**

There are two types of multipliers which are known as sequential and parallel multipliers. The first type iteratively computes the final product. It needs to use feedbacks and loops to compensate for the iterative portion. This design is too slow and not suitable for the reversible implementation. The second type (i.e., parallel multiplier), conventionally, consists of two main steps:

- Partial product generation
- Multi-operand addition

Partial products are independently computed in parallel-Consider two binary numbers A and B, of m and n bits, respectively.

There are mn summands that are produced in parallel by a set of mn AND gates –n x n multiplier requires n(n-2) full adders, n half-adders and n2 AND gates. The basic cell of the parallel array multiplier is shown in the figure. In this project a 4x4 parallel array multiplier is designed using reversible logic gates: Peres Gate in place of AND gate and PFAG gate in place of Full Adder

# The Array Multiplier (III)

· The following is a basic cell used in array multiplier



Figure 9. Array Multiplier

# **Design of Reversible Multiplier**

The proposed reversible multiplier is designed in two phases.

Part I: Partial Product Generation (PPG)

Part II: Multi-Operand Addition (MOA)

The operation of a 4\*4 reversible multiplier is shown in Figure 15. It consists of 16 Partial product bits of the X and Y inputs to perform 4 \* 4 multiplications. However, it can be extended to any other n \* n reversible multiplier. In this we design a multiplier using reversible gates. The reversible gates used in the design of multiplier are Peres gate and Peres full adder gate



Figure 10. The operation of the 4×4 parallel multiplier

ISSN NO: 2249-3034

# PARTIAL PRODUCT GENERATION:

Partial products can be generated in parallel using 16 Peres gates as shown in Figure 16. This uses 16 Peres gates and is a better circuit as it has less hardware complexity and quantum cost compared to other gates. An important point that should be considered is that in an n×n parallel multiplier (in reversible logic) for generating partial products in parallel, n copies of each bit of the operands are needed. Therefore, some fan-out gates are needed. The number of fan-out gates needed for the reversible 4×4 multiplier is 24

# REVERSIBLE MULTIPLIER AND ACCUMULATOR:

The operation of the 4x4 multiplier is depicted in Figure 2.4. It consists of 16 partial product bits of the form xi.yi.The reversible 4x4 multiplier circuit has two parts. First, the partial products are generated in parallel using Peres gates shown in Figure 2.3. Then, the addition is performed.



Figure 11. Partial Product generation circuit using Peres gates

The basic cell for such a multiplier is a Full Adder (FA) accepting three bits and one constant input. We use PFAG gate as reversible full adder. The proposed reversible multiplier circuit uses

eight reversible PFAG full adders. In addition, it needs four reversible half adders. It is possible to use PFAG gate as half adder as mentioned earlier in this study, but we use Peres gate as reversible half adder because it has less hardware complexity and quantum cost compared to the PFAG gate (quantum cost of Peres gate is 4 whereas for PFAG it is 8).

#### **Accumulator unit:**

The circuit of figure 15 using the peres gates is a bit-wise multiplier which generates the partial products PP0 to PP15 for a 4x4 multiplication and these partial products will be supplied to the multiplier circuit shown in figure 16. The multiplier's construction concept is shown in figure 17 which developed based on multiplication shown in figure 14. The circuit of figure 16 (using FA, HA) uses 4 Half adders and 8 Full adders. The circuit of the multiplier



Figure 12. The concept of product generation

The accumulator and buffer both are as shown in figure 18. This circuit is constructed using the HNG, PG and FG gates. HNG gate is used as full adder to serve as the accumulator and the FG gates are used to serve as the buffer circuits. Each HNG gate produces 2 garbage outputs since we have not used the two outputs P & Q as shown in figure 19



Figure 13. Proposed 4x4 reversible multiply and accumulate circuit using HNG gates and Feynman gates.



Fig-14: product generation with help of HNG and Peres gates

contains 9 bits including the carry generated during accumulation. The role of the FG gate is to serve as the buffer which can be cleared referring the figure 20 first input(A) of FG gate is SUM output og the HNG gate which will be brought out unchanged since the other input of the gate is made '0'. The other output, which is A is fed back to the HNG gate to serve as the prevous output. The FG gate is used here since there is no fanout in reversible logic. Furthur, it does not produce any garbage outputs.

# **RTL DIAGRAMS**

# **MAC UNIT:**



### PRODUCT USING REVERSIBLE RTL DIAGRAM:





# **DETAILED PRODUCT RTL DIAGRAM:**



# **SIMULATION RESULTS / WAVEFORMS:**





Figure 15. Simulation Results For Partial Product Term Generator Of Multiplier

### **CONCLUSION**

MAC unit is a basic arithmetic cell in computer processing units. Furthermore, reversible implementation of this unit is necessary for quantum computers. Targeting this purpose, various designs can be found in the literature.

We designed a novel 4x4 bit reversible multiplier circuit using Peres gates and HNG gates. Table 4.1 demonstrates that the proposed reversible multiplier circuit is better than the existing designs in terms of hardware complexity, number of gates, garbage outputs and constant inputs. Furthermore, the restrictions of reversible circuits were highly avoided. The proposed reversible 4x4 multiplier circuit can be generalized for N x N bit multiplication.

The prospect for further research includes the reversible implementation of more complex arithmetic circuits with less garbage outputs and low quantum cost.

In this paper we can state that our design approach is better than all the existing designs in terms of number of constant inputs. Comparing our proposed reversible multiplier circuit with the existing circuits in , it is found that the proposed design approach requires 28 reversible logic gates but the existing design in requires 40 reversible gates and the existing design also requires 29 reversible gates, Furthermore, the restrictions of reversible circuits were highly avoided.

#### References

- [1] Darjn Esposito; Antonio G. M. Strollo; Massimo Alioto, "Low-power approximate MAC unit", 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME).
- [2] P. Jagadeesh; S. Ravi; Kittur Harish Mallikarjun, "Design of high performance 64 bit MAC unit", 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT).
- [3] M. Priyanka; V. Balamurugan, "Design and Performance Analysis of a High Speed MAC Using Different Multipliers", 2015 Fifth International Conference on Advances in Computing and Communications (ICACC).
- [4] S Deepak ; Binsu J Kailath, "Optimized MAC unit design", 2012 IEEE International Conference on Electron Devices and Solid State Circuit (EDSSC) .
- [5] S. Swettha; S. Rashmi; N.S.S Reddy; R. Hemalatha," Area and power efficient MAC unit", 2018 Conference on Signal Processing And Communication Engineering Systems (SPACES).
- [6] Md. M. H Azad Khan, "Design of Full-adder With Reversible Gates", International Conference on Computer and Information Technology, Dhaka, Bangladesh, 2002, pp. 515-519.
- [7] Hafiz Md. Hasan Babu, Md. Rafiqul Islam, Syed Mostahed Ali Chowdhury and Ahsan Raja Chowdhury,"Reversible Logic Synthesis for Minimization of Full Adder Circuit",Proceeding of the EuroMicro Symposium on Digital Systen(DSD' 03) ,3-5 September 2003, Belek- Antalya,Turkey,pp-50-54.
- [8] Hafiz Md Hasan Babu, Md. Rafiqul Islam, Syed Mostahed Ali Chowdhury and Ahsan Raja Chowdhury "Synthesis of Full -Adder Circuit Using Reversible Logic ".Proceedings 17<sup>th</sup> International Conference on VLSI Design (VLSI Design 2004), January 2004, Mumbai, India, pp-757-760
- [9] J.W. Bruce, M.A. Thornton,L. Shivakumariah, P.S. Kokate and X.Li, "Efficient Adder Circuits Based on a Conservative Logic Gate", Proceedings of the IEEE Computer Society Annual Symposium on VLSI(ISVLSI'02),April 2002, Pittsburgh, PA, USA, pp 83-88.