# Introduction

ultiplication is a most generally used operation in wide computing systems.

In fact multiplication is nothing but addition since, multiplicand adds to itself multiplier number of times, gives the multiplication value between multiplier and multiplicand. But considering the fact that this kind of implementation really takes huge hardware resources and the circuit operates at utterly low speed. In order to address this so many ideas have been presented so far for the last three decades. Each one is aimed at improvement according to the requirement. One may be aimed at high clock speeds and another may be aimed for low power or less area occupation. Either way ultimate job is to come up with an efficient architecture which can address three constraints of VLSI speed, area, and power. Among three constrains, speed is the vital one which requires more attention. If we observe closely multiplication operation involves two steps one is producing partial products and adding these partial products [3].

Thus, the speed of a multiplier hardly depends on how fast generate the partial products and how fast we can add them together.Since the multipliers have a significant impact on the performance of the entire system, many high performance algorithms and architectures have been proposed [1][2][3][4][5][6][7][8][9][10][11][12]. The very high speed and dedicated multipliers are used in pipeline and vector computers.

Residue Number System (RNS) reduces the delay of carry propagation, thus offering significant speed up over the conventional binary system. This characteristic is advantageous when repetitive arithmetic operations on long operands have to be performed. RNS has been adopted in the design of Digital Signal Processors (DSP) .The low power consumption of RNS compared to conventional arithmetic circuits for the implementation of Finite Impulse Response (FIR) filters inspired lot of work against it. Therefore, RNS may be an interesting candidate for building processing circuits in deep submicron technologies.

The rest of the paper is organized as: Section-II describes Baugh-Wooley Multiplication Section-III provides deep understanding about Modified Booth Encoding techniques, Comparative results and its analysis are exploited in Section-IV and Finally Conclusion of the paper illustrated in Section -V.


# II.


# Baugh Wooley Multiplier

The Baugh-Wooley multiplication is one of the efficient methods to handle the sign bits and this approach has been developed in order to design regular multipliers [2], suited for 2's complement numbers.

Let us consider two n-bit signed numbers, X (Multiplicand) and Y (Multiplier), to be multiplied Investigating the VLSI Characterization of Parallel Signed Multipliers for RNS Applications using FPGAs
2 1 1 0 2 2 n n i n i i X x x ? ? ? = = ? + ? [1] 2 1 1 0 2 2 n n i n i i Y y y ? ? ? = = ? + ? [2]
where the xi's and yi's are the bits in X and Y, respectively, and x n-1 and y n-1 are the sign bits. The product, P= X * Y, is then given by the following equation:
P = X * Y 2 2 1 1 1 1 0 0 2 2 * 2 2 n n n i n j n i n j i j x x y y ? ? ? ? ? ? = = ? ? ? ? = ? + ? + ? ? ? ? ? ? ? ? ? ? 2 2 2 2 1 1 0 0 2 2 n n n i j n n i j i j x y x y ? ? ? + ? ? = = = + ?? 2 2 1 1 1 1 0 0 2 2 2 2 n n n i n j i n n j i j x y x y ? ? ? ? ? ? = = ? ? ? ? [3]
The final product can be obtained by subtracting the last two positive terms from the first two terms.

Instead of pursuing subtraction operation, it is possible to obtain the 2's complement of the last two terms and add all terms to get the final product. The final product (3), P=X * Y becomes:
P = X * Y 2 2 2 2 1 1 0 0 2 2 2 1 1 1 1 0 0 2 1 2 2 2 2 2 2 2 2 2 n n n i j n n i j i j n n n i n j i n n j i j n n x y x y x y x y ? ? ? ? ? = = ? ? ? ? ? ? = = ? = + + + ? + ? ? ? ? [4]
Simple 4x4 Baugh-wooley multiplication is exhibited in figure 1.

Figure 1 The same multiplication logic can be extended for different multiplier strength such as 4,8,16,32,64 bit-length and the efficiency is analyzed with simulation and synthesis tool .Baugh-wooley implementation require n2 AND gates and n(n-1) ADDERS as shown in figure 2. Figure 2 III.


# Booth Multiplier

The modified-Booth algorithm [1] is more preferred and extensively used for high-speed multiplier circuits. Modified Booth Multiplier is one of the different techniques for signed multiplication This multiplier order to improve the architecture, we have made 2 enhancements as in [14]. The first is to use efficient Wen-Chang's Modified Booth Encoder (MBE) since it is proved as the fastest scheme to generate a partial product. a) Algorithm of the Modified Booth Multiplier Booth Multiplication consists of three [10][11][12][13][14] steps:

1. The first step to generate the partial products; 2. The second step to add the generated partial products until the last two rows are remained; 3. The third step to compute the final multiplication results by adding the last two rows. The modified Booth algorithm reduces the number of partial products by half in the first step. We used the modified Booth encoding (MBE) scheme proposed in [1], It is known as the most efficient Booth encoding and decoding scheme. To multiply M by N using the modified Booth algorithm starts from grouping N by three bits and encoding into one of {-2, -1, 0, 1, 2}.Figure 3 exhibit the general architecture of MBE. In this case, the multiplicand is offset one bit to the left to enter into the adder while for the low-order multiplicand position a 0 is added. Each time the partial product is shifted two bit positions to the right and the sign is extended to the left.
Figure 3
During each add-shift cycle, different versions of the multiplicand are added to the new partial product depends on the equation derived from the bit-pair recoding table above. Here are some examples for understanding: Figure 4 The new MBE recorder [14] is designed in accordance to the area efficient wen-chang's Modified Booth Encoder (MBE) since it is proved to be the efficient architecture on trend, and Table (1) presents the truth table of the new encoding scheme. The way of application and calculation procedure is expressed in the following examples.

For the ease of understanding, the main two different categories of signed multiplication are taken into consideration that is multiplication of a negative multiplicand and positive multiplier in example-1 and both negative multiplicand and multiplier in case of example-2 are clearly described for understanding.

Example 1: For One negative and One positive number.


# Consider -3 x 5

Step-1: binary conversion and 2's complement

Step-2: Multiplication by Modified booth recoding Example 2: For Both Negative Numbers.


# Consider -3 x -4

Step-1: Binary conversion and 2's complement

Step-2: Multiplication by Modified booth recoding Once the partial products are generated then the addition process is very similar to the array multiplier.

IV.


# Results and Analysis

The Multiplier were taken for analysis was described using structural Verilog HDL and synthesized to produce a gate level net list using two different synthesizer namely Xilinx ISE Design Suite 14. 


# c) Power Analysis

Power Evaluation of the design done at various levels such as Total Thermal power Dissipation (mWmilli Watt's), Core Dynamic Thermal power Dissipation (mW), core static Thermal power Dissipation (mW), I/O Thermal Power Dissipation(mW). Among the various power levels dynamic power varies with design to design it decides the efficient architecture.

Dynamic Power Requirement of the design is decided based on number of signal transition (or) activity during simulation time. Here analysis has been made using Power Play Power Analyzer from Altera. Power Analyzer required an input file of Signal Activities and Value Changed Dump (VCD) File to evaluate the power of the design. Here we have measure the signal activities count for 20 different Samples for 100ns simulation and the same sample is forced for other design also in order to evaluate the exact power difference between the design. power Analysis with powerplay analyzer tool for 4 x 4 bit shows 46.90% Modified Booth consume less than Baugh-wooley Multiplier and found consistence for all strength.  
Figure 5
The Xilinx Simulation result for booth-32 x 32 bit is exhibited below in the Figure 5, and then the structure level port-map model is synthesized as Gate-level Netlist for signal Transition calculation. Modified Booth's 64 x 64 bit simulation result on Altera Quartus-II is illustrated in the Figure 6, and then synthesis summary is depicted inFigure7-11.     
1![Modified Booth Encoder Logic[1] ](image-2.png "Table 1 :")
![3, Altera Quartus II 12.0 with reference to Virtex7 XCV2000T-2FLG1925 and Cyclone II EP2C35F672C6 FPGA respectively. The multipliers were simulated and analyzed at different strengths such as 4 x 4, 8 x 8, 16 x16, 32 x 32 and 64 x 64 as shown below in table [2-4] . a) Area Analysis In FPGA based design, Area requirement of the design is proportional to logic utilization i.e in Xilinx -Number of Slice LUTs Required and in Altera its Number of Logic Elements Required. For 16 x 16 bit strength Booth Consume 20.5% lesser area than Baugh-Wooley Multiplier. b) Delay Analysis In FPGA based Design, EDA tools having inbuilt capability to predict the Delay of the design. In Xilinx -Timing Analyzer Tool and in Altera Time Quest Timing Analyzer Tool were used for delay analyze. Various Delay analysis shows Modified Booth has about 43% performance efficient over Baugh-Wooley.](image-3.png "")
6![Figure 6](image-4.png "Figure 6 Global")
9![plot for Altera Area-Multiplier strength versus No. of LUT's figure and 10-Altera Delay-Multiplier strength versus delay time (ns).and finally figure 11 Graph plot for Altera Powerplay power-strength versus power dissipation (mW).](image-5.png "Figure 9 -")
789![Figure 7](image-6.png "Figure 7 Figure 8 Figure 9")
10![Figure 10](image-7.png "Figure 10")
11![Figure 11](image-8.png "Figure 11 V")
![](image-9.png "")
2MultipliersMultiplierNo.Altera Cyclone IIStrengthNameofEP2C35F672C6IOBsNo. ofDelayLogic(ns)ElementsRequired4x4BAUGH163015.650BOOTH162810.1738x8BAUGH3216436.994BOOTH3215025.08216x16BAUGH6469899.377BOOTH6453842.82632x32BAUGH1282,874325.172BOOTH1282,28487.47364x64BAUGH25610,122956.214BOOTH2569,542189.886
3MultipliersMultiplierNo.Xilinx Virtex7StrengthNameofXCV2000T-IOBs2FLG1925No. ofDelaySlice(ns)LUTsRequired4x4BAUGH162015.91BOOTH161810.148x8BAUGH3210455.93BOOTH329622.1516x16BAUGH64452191.84BOOTH6435440.8732x32BAUGH1281851670.46BOOTH128159581.1964x64BAUGH25673921838.32BOOTH2566480159.28A
4Altera Cyclone II EP2C35F672C6Multipliers StrengthMultiplier NameNumber Signal Transition during simulation for 100nsTotal Thermal Power Dissipation (mW)Power estimation Dynamic Core Thermal Dissipation (mW) Core Static Thermal power Dissipation (mW)I/O Thermal power Dissipation (mW)4x4BAUGH BOOTH1857 986169.92 166.131.13 1.0180.12 80.0186.67 86.598x8BAUGH BOOTH20911 10291223.47 223.394.81 5.2880.30 80.30138.36 138.3016x16BAUGH BOOTH498261 51942351.24 345.2527.12 19.8680.74 80.72243.39 244.6732x32BAUGH BOOTH9606019 469336642.20 601.67115.05 82.3181.75 81.61445.40 437.7464x64BAUGH BOOTH19212038 18773441302.34 1278.88331.53 360.3083.13 83.24887.68 836.34
			© 2015 Global Journals Inc. (US) 1
			© 2015 Global Journals Inc. (US)
		
		
* 
	
		VLSI Implementation of Area-Efficient Truncated Modified Booth Multiplier for Signal Processing Applications
		
			KNVijeyakumar
		
		
			.VDr
		
		
			SSumathy
		
		
			Elango
		
	
		The Arabian Journal for Science and Engineering
		
			39
			11
			
			2014
		
	
* 
	
		An Efficient Baugh-Wooley Architecture for both Signed & Unsigned Multiplication
		
			PramodiniMohanty
		
	
		International Journal of Computer Science & Engineering Technology (IJCSET)
		
			3
			2012
		
	
* 
	
		Fast Multiplication: Algorithms and Implementation
		
			GWBewick
		
		
			1994
			Stanford, CA
		
		
			Stanford University
		
	
	Ph.D. dissertation


* 
	
		Hard multiple generator for higher radix modulo multiplication
		
			RMuralidharan
		
		
			CHChang
		
	
		Proceedings 12th International Symposium. Integrated Circuits
				12th International Symposium. Integrated CircuitsSingapore
		
			2009
			
		
* 
	
		A 1.2-ns16×16-Bit Binary Multiplier Using. High Speed Compressors
		
			ADandapat
		
		
			SGhosal
		
		
			PSarkar
		
		
			DMukhopadhyay
		
	
		International Journal of Electrical
		
			
			2009. 2009
		
	
	Computer, and Systems Engineering


* 
	
		Low-Voltage, Low-Power, VLSI Subsystems Tata MC-Graw Hill
		
			KaushikKiat-Seng Yeo
		
		
			Roy
		
		
* 
	
		Special-purpose hardware for digital filtering
		
			SLFreeny
		
	
		Proceedings, .IEEE
		
			
			1975
		
	
* 
	
		A suggestion for parallel multipliers
		
			CSWallace
		
	
		IEEE Transaction on Electronic and Computer
		
			
			1964
		
	
* 
	
		Automated formal synthesis of Wallace tree multipliers
		
			OHasan
		
		
			SKort
		
	
		Proceedings 50th Midwest Symposium Circuits and System
				50th Midwest Symposium Circuits and System
		
			2007
		
	
* 
	
		M × N booth encoded multiplier generator using optimized Wallace trees
		
			JFadavi-Ardekani
		
	
		IEEE Transaction. on Very Large Scale Integration.(VLSI) System
		
			
			1993
		
	
* 
	
		A fast parallel multiplier-accumulator using the modified Booth algorithm
		
			FElguibaly
		
	
		IEEE Transaction. Circuits System. II, Analog Digitial. Signal Process
		
			
			2000
		
	
* 
	
		Design of a high performance 32 × 32-bit multiplier with a novel sign select Booth encoder
		
			KChoi
		
		
			MSong
		
	
		Proceedings on IEEE International. Symposium on Circuits System
				on IEEE International. Symposium on Circuits System
		
			2001
			
		
* 
	
		Efficient design of modified Booth multipliers for predetermined coefficients
		
			YEKim
		
		
			JOYoon
		
		
			KJCho
		
		
			JGChung
		
		
			SICho
		
		
			SSChoi
		
	
		Proceedings on IEEE International. Symposium on Circuits and Systems
				on IEEE International. Symposium on Circuits and Systems
		
			2006
			
		
* 
	
		High-speed booth encoded parallel multiplier design
		
			W.-CYeh
		
		
			C.-WJen
		
	
		IEEE Transactions on Computers
		15. J.-Y. Kang and J.-L. Gaudiot
		
			
			2000. 2006
		
	
	IEEE Transactions on Computers


* 
	
		General algorithms for a simplified addition of 2's complement numbers
		
			OSalomon
		
		
			J.-MGreen
		
		
			HKlar
		
	
		IEEE Journal on Solid-State Circuits
		
			
			1995
		
	
* 
	
		Low power parallel multipliers
		
			EAngel
		
		
			EESwartzlander
		
		
			Jr
		
	
		Workshop VLSI Signal Process. IX
				
			1996
			
		
* 
	
		Residue Arithmetic and its Application to Computer Technology
		
			NSSzabo
		
		
			RITanaka
		
		
			1967
			McGraw-Hill
			New York
		
	
* 
	
		
			MASoderstrand
		
		Residue Number System Arithmetic Modern Applications in Digital Signal Processing
				
			IEEE Press
			1986
		
	
* 
	
		Novel High-Radix Residue Number System Multipliers and Adders
		
			TPaliouras
		
		
			Stouraitis
		
	
		Proceedings IEEE International Symposium on Circuits and Systems
				IEEE International Symposium on Circuits and Systems
		
			1999