# Introduction ultiplication is a most generally used operation in wide computing systems. In fact multiplication is nothing but addition since, multiplicand adds to itself multiplier number of times, gives the multiplication value between multiplier and multiplicand. But considering the fact that this kind of implementation really takes huge hardware resources and the circuit operates at utterly low speed. In order to address this so many ideas have been presented so far for the last three decades. Each one is aimed at improvement according to the requirement. One may be aimed at high clock speeds and another may be aimed for low power or less area occupation. Either way ultimate job is to come up with an efficient architecture which can address three constraints of VLSI speed, area, and power. Among three constrains, speed is the vital one which requires more attention. If we observe closely multiplication operation involves two steps one is producing partial products and adding these partial products [3]. Thus, the speed of a multiplier hardly depends on how fast generate the partial products and how fast we can add them together.Since the multipliers have a significant impact on the performance of the entire system, many high performance algorithms and architectures have been proposed [1][2][3][4][5][6][7][8][9][10][11][12]. The very high speed and dedicated multipliers are used in pipeline and vector computers. Residue Number System (RNS) reduces the delay of carry propagation, thus offering significant speed up over the conventional binary system. This characteristic is advantageous when repetitive arithmetic operations on long operands have to be performed. RNS has been adopted in the design of Digital Signal Processors (DSP) .The low power consumption of RNS compared to conventional arithmetic circuits for the implementation of Finite Impulse Response (FIR) filters inspired lot of work against it. Therefore, RNS may be an interesting candidate for building processing circuits in deep submicron technologies. The rest of the paper is organized as: Section-II describes Baugh-Wooley Multiplication Section-III provides deep understanding about Modified Booth Encoding techniques, Comparative results and its analysis are exploited in Section-IV and Finally Conclusion of the paper illustrated in Section -V. # II. # Baugh Wooley Multiplier The Baugh-Wooley multiplication is one of the efficient methods to handle the sign bits and this approach has been developed in order to design regular multipliers [2], suited for 2's complement numbers. Let us consider two n-bit signed numbers, X (Multiplicand) and Y (Multiplier), to be multiplied Investigating the VLSI Characterization of Parallel Signed Multipliers for RNS Applications using FPGAs 2 1 1 0 2 2 n n i n i i X x x ? ? ? = = ? + ? [1] 2 1 1 0 2 2 n n i n i i Y y y ? ? ? = = ? + ? [2] where the xi's and yi's are the bits in X and Y, respectively, and x n-1 and y n-1 are the sign bits. The product, P= X * Y, is then given by the following equation: P = X * Y 2 2 1 1 1 1 0 0 2 2 * 2 2 n n n i n j n i n j i j x x y y ? ? ? ? ? ? = = ? ? ? ? = ? + ? + ? ? ? ? ? ? ? ? ? ? 2 2 2 2 1 1 0 0 2 2 n n n i j n n i j i j x y x y ? ? ? + ? ? = = = + ?? 2 2 1 1 1 1 0 0 2 2 2 2 n n n i n j i n n j i j x y x y ? ? ? ? ? ? = = ? ? ? ? [3] The final product can be obtained by subtracting the last two positive terms from the first two terms. Instead of pursuing subtraction operation, it is possible to obtain the 2's complement of the last two terms and add all terms to get the final product. The final product (3), P=X * Y becomes: P = X * Y 2 2 2 2 1 1 0 0 2 2 2 1 1 1 1 0 0 2 1 2 2 2 2 2 2 2 2 2 n n n i j n n i j i j n n n i n j i n n j i j n n x y x y x y x y ? ? ? ? ? = = ? ? ? ? ? ? = = ? = + + + ? + ? ? ? ? [4] Simple 4x4 Baugh-wooley multiplication is exhibited in figure 1. Figure 1 The same multiplication logic can be extended for different multiplier strength such as 4,8,16,32,64 bit-length and the efficiency is analyzed with simulation and synthesis tool .Baugh-wooley implementation require n2 AND gates and n(n-1) ADDERS as shown in figure 2. Figure 2 III. # Booth Multiplier The modified-Booth algorithm [1] is more preferred and extensively used for high-speed multiplier circuits. Modified Booth Multiplier is one of the different techniques for signed multiplication This multiplier order to improve the architecture, we have made 2 enhancements as in [14]. The first is to use efficient Wen-Chang's Modified Booth Encoder (MBE) since it is proved as the fastest scheme to generate a partial product. a) Algorithm of the Modified Booth Multiplier Booth Multiplication consists of three [10][11][12][13][14] steps: 1. The first step to generate the partial products; 2. The second step to add the generated partial products until the last two rows are remained; 3. The third step to compute the final multiplication results by adding the last two rows. The modified Booth algorithm reduces the number of partial products by half in the first step. We used the modified Booth encoding (MBE) scheme proposed in [1], It is known as the most efficient Booth encoding and decoding scheme. To multiply M by N using the modified Booth algorithm starts from grouping N by three bits and encoding into one of {-2, -1, 0, 1, 2}.Figure 3 exhibit the general architecture of MBE. In this case, the multiplicand is offset one bit to the left to enter into the adder while for the low-order multiplicand position a 0 is added. Each time the partial product is shifted two bit positions to the right and the sign is extended to the left. Figure 3 During each add-shift cycle, different versions of the multiplicand are added to the new partial product depends on the equation derived from the bit-pair recoding table above. Here are some examples for understanding: Figure 4 The new MBE recorder [14] is designed in accordance to the area efficient wen-chang's Modified Booth Encoder (MBE) since it is proved to be the efficient architecture on trend, and Table (1) presents the truth table of the new encoding scheme. The way of application and calculation procedure is expressed in the following examples. For the ease of understanding, the main two different categories of signed multiplication are taken into consideration that is multiplication of a negative multiplicand and positive multiplier in example-1 and both negative multiplicand and multiplier in case of example-2 are clearly described for understanding. Example 1: For One negative and One positive number. # Consider -3 x 5 Step-1: binary conversion and 2's complement Step-2: Multiplication by Modified booth recoding Example 2: For Both Negative Numbers. # Consider -3 x -4 Step-1: Binary conversion and 2's complement Step-2: Multiplication by Modified booth recoding Once the partial products are generated then the addition process is very similar to the array multiplier. IV. # Results and Analysis The Multiplier were taken for analysis was described using structural Verilog HDL and synthesized to produce a gate level net list using two different synthesizer namely Xilinx ISE Design Suite 14. # c) Power Analysis Power Evaluation of the design done at various levels such as Total Thermal power Dissipation (mWmilli Watt's), Core Dynamic Thermal power Dissipation (mW), core static Thermal power Dissipation (mW), I/O Thermal Power Dissipation(mW). Among the various power levels dynamic power varies with design to design it decides the efficient architecture. Dynamic Power Requirement of the design is decided based on number of signal transition (or) activity during simulation time. Here analysis has been made using Power Play Power Analyzer from Altera. Power Analyzer required an input file of Signal Activities and Value Changed Dump (VCD) File to evaluate the power of the design. Here we have measure the signal activities count for 20 different Samples for 100ns simulation and the same sample is forced for other design also in order to evaluate the exact power difference between the design. power Analysis with powerplay analyzer tool for 4 x 4 bit shows 46.90% Modified Booth consume less than Baugh-wooley Multiplier and found consistence for all strength. Figure 5 The Xilinx Simulation result for booth-32 x 32 bit is exhibited below in the Figure 5, and then the structure level port-map model is synthesized as Gate-level Netlist for signal Transition calculation. Modified Booth's 64 x 64 bit simulation result on Altera Quartus-II is illustrated in the Figure 6, and then synthesis summary is depicted inFigure7-11. 1![Modified Booth Encoder Logic[1] ](image-2.png "Table 1 :") ![3, Altera Quartus II 12.0 with reference to Virtex7 XCV2000T-2FLG1925 and Cyclone II EP2C35F672C6 FPGA respectively. The multipliers were simulated and analyzed at different strengths such as 4 x 4, 8 x 8, 16 x16, 32 x 32 and 64 x 64 as shown below in table [2-4] . a) Area Analysis In FPGA based design, Area requirement of the design is proportional to logic utilization i.e in Xilinx -Number of Slice LUTs Required and in Altera its Number of Logic Elements Required. For 16 x 16 bit strength Booth Consume 20.5% lesser area than Baugh-Wooley Multiplier. b) Delay Analysis In FPGA based Design, EDA tools having inbuilt capability to predict the Delay of the design. In Xilinx -Timing Analyzer Tool and in Altera Time Quest Timing Analyzer Tool were used for delay analyze. Various Delay analysis shows Modified Booth has about 43% performance efficient over Baugh-Wooley.](image-3.png "") 6![Figure 6](image-4.png "Figure 6 Global") 9![plot for Altera Area-Multiplier strength versus No. of LUT's figure and 10-Altera Delay-Multiplier strength versus delay time (ns).and finally figure 11 Graph plot for Altera Powerplay power-strength versus power dissipation (mW).](image-5.png "Figure 9 -") 789![Figure 7](image-6.png "Figure 7 Figure 8 Figure 9") 10![Figure 10](image-7.png "Figure 10") 11![Figure 11](image-8.png "Figure 11 V") ![](image-9.png "") 2MultipliersMultiplierNo.Altera Cyclone IIStrengthNameofEP2C35F672C6IOBsNo. ofDelayLogic(ns)ElementsRequired4x4BAUGH163015.650BOOTH162810.1738x8BAUGH3216436.994BOOTH3215025.08216x16BAUGH6469899.377BOOTH6453842.82632x32BAUGH1282,874325.172BOOTH1282,28487.47364x64BAUGH25610,122956.214BOOTH2569,542189.886 3MultipliersMultiplierNo.Xilinx Virtex7StrengthNameofXCV2000T-IOBs2FLG1925No. ofDelaySlice(ns)LUTsRequired4x4BAUGH162015.91BOOTH161810.148x8BAUGH3210455.93BOOTH329622.1516x16BAUGH64452191.84BOOTH6435440.8732x32BAUGH1281851670.46BOOTH128159581.1964x64BAUGH25673921838.32BOOTH2566480159.28A 4Altera Cyclone II EP2C35F672C6Multipliers StrengthMultiplier NameNumber Signal Transition during simulation for 100nsTotal Thermal Power Dissipation (mW)Power estimation Dynamic Core Thermal Dissipation (mW) Core Static Thermal power Dissipation (mW)I/O Thermal power Dissipation (mW)4x4BAUGH BOOTH1857 986169.92 166.131.13 1.0180.12 80.0186.67 86.598x8BAUGH BOOTH20911 10291223.47 223.394.81 5.2880.30 80.30138.36 138.3016x16BAUGH BOOTH498261 51942351.24 345.2527.12 19.8680.74 80.72243.39 244.6732x32BAUGH BOOTH9606019 469336642.20 601.67115.05 82.3181.75 81.61445.40 437.7464x64BAUGH BOOTH19212038 18773441302.34 1278.88331.53 360.3083.13 83.24887.68 836.34 © 2015 Global Journals Inc. (US) 1 © 2015 Global Journals Inc. (US) * VLSI Implementation of Area-Efficient Truncated Modified Booth Multiplier for Signal Processing Applications KNVijeyakumar .VDr SSumathy Elango The Arabian Journal for Science and Engineering 39 11 2014 * An Efficient Baugh-Wooley Architecture for both Signed & Unsigned Multiplication PramodiniMohanty International Journal of Computer Science & Engineering Technology (IJCSET) 3 2012 * Fast Multiplication: Algorithms and Implementation GWBewick 1994 Stanford, CA Stanford University Ph.D. dissertation * Hard multiple generator for higher radix modulo multiplication RMuralidharan CHChang Proceedings 12th International Symposium. Integrated Circuits 12th International Symposium. Integrated CircuitsSingapore 2009 * A 1.2-ns16×16-Bit Binary Multiplier Using. High Speed Compressors ADandapat SGhosal PSarkar DMukhopadhyay International Journal of Electrical 2009. 2009 Computer, and Systems Engineering * Low-Voltage, Low-Power, VLSI Subsystems Tata MC-Graw Hill KaushikKiat-Seng Yeo Roy * Special-purpose hardware for digital filtering SLFreeny Proceedings, .IEEE 1975 * A suggestion for parallel multipliers CSWallace IEEE Transaction on Electronic and Computer 1964 * Automated formal synthesis of Wallace tree multipliers OHasan SKort Proceedings 50th Midwest Symposium Circuits and System 50th Midwest Symposium Circuits and System 2007 * M × N booth encoded multiplier generator using optimized Wallace trees JFadavi-Ardekani IEEE Transaction. on Very Large Scale Integration.(VLSI) System 1993 * A fast parallel multiplier-accumulator using the modified Booth algorithm FElguibaly IEEE Transaction. Circuits System. II, Analog Digitial. Signal Process 2000 * Design of a high performance 32 × 32-bit multiplier with a novel sign select Booth encoder KChoi MSong Proceedings on IEEE International. Symposium on Circuits System on IEEE International. Symposium on Circuits System 2001 * Efficient design of modified Booth multipliers for predetermined coefficients YEKim JOYoon KJCho JGChung SICho SSChoi Proceedings on IEEE International. Symposium on Circuits and Systems on IEEE International. Symposium on Circuits and Systems 2006 * High-speed booth encoded parallel multiplier design W.-CYeh C.-WJen IEEE Transactions on Computers 15. J.-Y. Kang and J.-L. Gaudiot 2000. 2006 IEEE Transactions on Computers * General algorithms for a simplified addition of 2's complement numbers OSalomon J.-MGreen HKlar IEEE Journal on Solid-State Circuits 1995 * Low power parallel multipliers EAngel EESwartzlander Jr Workshop VLSI Signal Process. IX 1996 * Residue Arithmetic and its Application to Computer Technology NSSzabo RITanaka 1967 McGraw-Hill New York * MASoderstrand Residue Number System Arithmetic Modern Applications in Digital Signal Processing IEEE Press 1986 * Novel High-Radix Residue Number System Multipliers and Adders TPaliouras Stouraitis Proceedings IEEE International Symposium on Circuits and Systems IEEE International Symposium on Circuits and Systems 1999