Artificial Intelligence formulated this projection for compatibility purposes from the original article published at Global Journals. However, this technology is currently in beta. *Therefore, kindly ignore odd layouts, missed formulae, text, tables, or figures.* 

# Modified Distributive Arithmetic based 2d-Dwt for Hybrid (Neural Network-Dwt) Image Compression Mr. Murali Mohan.S<sup>1</sup> and Dr. P.Satyanarayana<sup>2</sup> Received: 16 December 2013 Accepted: 2 January 2014 Published: 15 January 2014

#### 7 Abstract

13

8 Artificial Neural Networks (ANN) is significantly used in signal and image processing

<sup>9</sup> techniques for pattern recognition and template matching. Discrete Wavelet Transform

<sup>10</sup> (DWT) is combined with neural network to achieve higher compression if 2D data such as

<sup>11</sup> image. Image compression using neural network and DWT have shown superior results over

<sup>12</sup> classical techniques, with 70

14 **Index terms**— DWT, neural network, image compression, VLSI implementation, high speed, low power, 15 modified DAA.

# 16 1 Introduction

17 mage compression is one of the most promising subjects in image processing. Images captured need to be 18 stored or transmitted over long distances. Raw image occupies memory and hence need to be compressed. With the demand for high quality video on mobile platforms there is a need to compress raw images and 19 reproduce the images without any degradation. Several standards such as JPEG200, MPEG-2/4 recommend 20 use of Discrete Wavelet Transforms (DWT) for image transformation [1] which leads to compression with when 21 encoded. Wavelets are a mathematical tool for hierarchically decomposing functions in multiple hierarchical sub 22 bands with time scale resolutions. Image compression using Wavelet Transforms is a powerful method that is 23 preferred by scientists to get the compressed images at higher compression ratios with higher PSNR values [2]. It 24 is a popular transform used for some of the image compression standards in lossy compression methods. Unlike 25 26 the discrete cosine transform, the wavelet transform is not Fourier-based and therefore wavelets do a better job 27 of handling discontinuities in data. On the other hand, Artificial Neural Networks (ANN) for image compression applications has marginally increased in recent years. Neural networks are inherent adaptive systems [3][4] [5] 28 [6]; they are suitable for handling nonstationaries in image data. Artificial neural network can be employed 29 with success to image compression. Image Compression Using Neural Networks by Ivan Vilovic [7] reveals 30 a direct solution method for image compression using the neural networks. An experience of using multilayer 31 perceptron for image compression is also presented. The multilayer perceptron is used for transform coding of the 32 image. Image compression with neural networks by J. Jiang [8] presents an extensive survey on the development 33 of neural networks for image compression which covers three categories: direct image compression by neural 34 networks; neural network implementation of existing techniques, and neural network based technology which 35 provide improvement over traditional algorithms. Neural Networks-based Image Compression System by H. Nait 36 37 Charif and Fathi. M. Salam [9] describes a practical and effective image compression system based on multilayer 38 neural networks. The system consists of two multilayer neural networks that compress the image in two stages. 39 The algorithms and architectures reported in these papers sub divided the images into sub blocks and the sub 40 blocks are reorganized for processing. Reordering of sub blocks leads to blocking artifacts. Hence it is required to avoid reorganization of sub blocks. One of the methods was to combine neural networks with wavelets for image 41 compression. Image compression using wavelet transform and a neural network was suggested previously [10]. 42 Wavelet networks (WNs) were introduced by Zhang and Benveniste [11], [12] in 1992 as a combination of artificial 43 neural networks and wavelet decomposition. Since then, however, WNs have received only little attention. In the 44 wavelet networks, the basis radial functions in some RBF-networks are replaced by wavelets. Szu et al. [13], [14] 45

have shown usage of WNs for signals representation and classification. They have explained how a set of WN, 46 "a super wavelet", can be produced and the original ideas presented can be used for the assortment of model. 47 Besides, they have mentioned the big compression of data achieved by such a representation of WN's. Zhang 48 [15] has proved that the WN's can manipulate the non-linear regression of the moderately big dimension of entry 49 with the data of training. Ramanaiah and Cyril [16] in their paper have reported the use of neural networks 50 and wavelets for image compression. Murali et al. [17] reports use of neural networks with DWT improves 51 compression ratio by 70% and MSE by 20%. The complexities of hardware implementation on VLSI platform 52 are not discussed in this paper. Murali et. al [18] reports the use of FPGA for implementation of neural network 53 and DWT architecture, the design operates are 127 MHz and consumes 0.45 mW on Virtex-5 FPGAs. Sangyun 54 et. al., [19] in their work have proposed a new logic for distributive arithmetic algorithm and have designed 55 for FIR filters. The develop logic is optimized for low power applications. In their work, the LUT coefficients 56 are computed based on a suitable number system, and are stored in LUT. Low power techniques such as block 57 enabling logic, memory bank logic and clock gating logic have been used for optimization. However, the work 58 does not consider the FPGA resources for power optimization; as well the developed architecture is suitable 59 for higher order filter coefficients. Hence there is a need for customized architecture for DWT filters that can 60 efficiently utilize the FPGA resources. Cyril P. Raj, et. al., [20] in their work have developed a parallel and 61 62 pipelined distributive arithmetic architecture for DWT, the design achieves higher throughput and lower latency, 63 but consumes large area on FPGA. The symmetric property of DWT coefficients have not been used to reduce hardware complexities. Chengjun Zhang, Chunyan Wang, and M. Omair Ahmad [21] propose a scheme for the 64 design of pipeline architecture for fast computation of the DWT is developed. The goal of fast computation is 65 achieved by minimizing the number and period of clock cycles. The main idea used for minimizing these two 66 parameters is to optimally distribute the task of the DWT computation among the stages of the pipeline and to 67 maximize the inter-and intra-stage parallelisms of the pipeline. In this paper 2D-DWT architecture is designed 68 and implemented on VLSI platform for optimizing area, timing and power. Section II presents theoretical 69 background on neural networks and DWT. Section III discusses the image compression architecture using DWT 70 and ANN technique, section IV presents VLSI implementation of DWT architecture and conclusion is presented 71 72 in section V.

#### <sup>73</sup> 2 II. Neural Networks and dwt

74 In this section, neural network architecture for image compression is discussed. Feed forward neural network 75 architecture and back propagation algorithm for training is presented. DWT based image transformation and 76 compression is also presented in this section.

77 Compression is one of the major subject of research, the need for compression is discussed as follows [17]: Uncompressed video of size 640 x 480 resolution, with each pixel of 8 bit (1 bytes), with 24 fps occupies 307.2 78 79 Kbytes per image (frame) or 7.37 Mbytes per second or 442 Mbytes per minute or 26.5 Gbytes per hour. If the frame rate is increased from 24 fps to 30 fps, then for 640 x 480 resolution, 24 bit (3 bytes) colour, 30 fps 80 occupies 921.6 Kbytes per image (frame) or 27.6 Mbytes per second or 1.66 Gbytes per minute or 99.5 Gbytes 81 per hour. Given a 100 Gigabyte disk can store about 1-4 hours of high quality video, with channel data rate of 82 64Kbits/sec -40 -438 secs/per frame transmission. For HDTV with 720 x 1280 pixels/frame, progressive scanning 83 at 60 frames/s: 1.3Gb/s -with 20Mb/s available -70% compression required -0.35bpp. In this work we propose a 84 85 novel architecture based on neural network and DWT [18]. a) Feed forward neural network architecture for image 86 compression An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the Brian, process information [16]. The key element of this paradigm 87 is the novel structure of the information processing system. The basic architecture for image compression using 88 neural network is shown in fig. 1. The network has input layer, hidden layer and output layer. Inputs from the 89 image are fed into the network, which are passed through the multi layered neural network. The input to the 90 network is the original image and the output obtained is the reconstructed image. The output obtained at the 91 hidden layer is the compressed image. The network is used for image compression by breaking it in two parts as 92 shown in the Fig. 1. The transmitter encodes and then transmits the output of the hidden layer (only 16 values 93 as compared to the 64 values of the original image). The receiver receives and decodes the 16 hidden outputs 94 and generates the 64 outputs. Since the network is implementing an identity map, the output at the receiver is 95 an exact reconstruction of the original image. 96

Three layers, one input layer, one output layer and one hidden layer, are designed. The input layer and output layer are fully connected to the hidden layer. Compression is achieved by designing the network such that the number of neurons at the hidden layer is less than that of neurons at both input and the output layers. The input image is split up into blocks or vectors of 8 X8, 4 X 4 or 16 X 16 pixels. Back-propagation is one of the neural networks which are directly applied to image compression coding [20][21] [22].

In the previous sections theory on the basic structure of the neuron was considered. The essence of the neural networks lies in the way the weights are updated. The updating of the weights is through a definite algorithm. In this paper Back Propagation (BP) algorithm is studied and implemented.

### <sup>105</sup> 3 b) DWT for Image Compression

The DWT represents the signal in dynamic subband decomposition. Generation of the DWT in a wavelet 106 packet allows sub-band analysis without the constraint of dynamic decomposition. The discrete wavelet packet 107 transform (DWPT) performs an adaptive decomposition of frequency axis. The specific decomposition will 108 be selected according to an optimization criterion. The Discrete Wavelet Transform (DWT), based on time-109 scale representation, provides efficient multi-resolution sub-band decomposition of signals. It has become a 110 powerful tool for signal processing and finds numerous applications in various fields such as audio compression, 111 pattern recognition, texture discrimination, computer graphics [24][25] [26] etc. Specifically the 2-D DWT and its 112 counterpart 2-D Inverse DWT (IDWT) play a significant role in many image/video coding applications. Fig. 2 113 shows the DWT architecture, the input image is decomposed into high pass and low pass components using HPF 114 and LPF filters giving rise to the first level of hierarchy. The process is continued until multiple hierarchies are 115 obtained. A1 and D1 are the approximation and detail filters. [16] Several images are considered for training the 116 network, the input image is resized to 256 x 256, the resized image is transformed using DWT, 2D DWT function 117 is used for the transformation. There is several wavelet functions, in this work Haar and dB4 wavelet functions 118 are used. The input image is decomposed to obtain the sub band components using several stages of DWT. The 119 DWT process is stopped until the sub band size is 8 x 8. The decomposed sub band components are rearranged 120 to column vectors; the rearranged vectors are concatenated to matrix and are set at the input to the neural 121 network. The hidden layer is realized using 4 neurons and tansig function. The weights are biases obtained after 122 training are used to compress the input to the required size and is further processed using weights and biases 123 in the output layer to decompress. The decompressed is further converted from vector to blocks of sub bands. 124 The sub band components are grouped together and are transformed using inverse DWT. The transformation 125 is done using multiple hierarchies and the original image is reconstructed. The input image and the output 126 image are used to compute MSE, PSNR. A detailed discussion on DWT with NN for image compression and 127 the performance results are presented in [17] [18]. One of the major challenges in this work is the hardware 128 129 complexity of DWT and NN architecture. In order to reduce the computation complexity on hardware platform, in this work a modified architecture for DWT is proposed, design, modeled and implemented on VLSI platform. 130 Next section discusses the modified architecture. 131

#### <sup>132</sup> 4 IV. Distributive Arithmetic Architecture

for FIR Filters DWT is realized using low pass and high pass FIR filters. In an FIR filter, the incoming signal is processed by the filter coefficients to produce the output samples. The filters coefficients are designed or identified based on the required specifications and are used in design of filter architecture. Fig. 9 shows the basic block of FIR filter. The relation between input, output and filter coefficients are related using convolution sum. ()1

The convolution operation is basically sum of products. Thus the convolution operation in equation (1) can be expressed as in equation (2)Y k = ? H k N k?1 X k(2)

139 X k = Input samples, H k = Filter coefficients, Y k = Output and N = Filter order length In general X k and 140 Y k are represented using 2's complement number system. Thus representing both positive and negative values 141 of input and filter samples. In 2's complement format X k is represented as, X k = {b k0, b k1, b k2? ? b kL 142 ?1}

, where L is the number of bits or word length. In 2's complement number system MSB = 1 implies it is a negative number and thus sign extension is carried out. For analysis X k can be mathematically represented as in equation (??)X k = ?b k0 + ? b kn 2 ?n L?1 n=1

(3) Where b k0 = sign bit and b kn = binary bits representing magnitude. Substituting (3) in (2), equation ( 147 1) can be expressed as in equation (4)Y = ? H k [?b k0 + ? b kn 2 ?n L?1 n=1 ] N k=1(4)

?2 ??? can be written as Year 2014 F be used as address to the memory and can be used to access the memory 154 contents. Thus avoiding multiplication process. 3. The equation (??) is similar to equation (??), the only 155 difference is the binary bits are the second MSB bits of input samples. Similarly, as discussed previously there 156 are 16 possible combinations of partial products that can be accessed. 4. Comparing equation (??) to equation 157 (7), each bit of input samples are used in accessing the memory contents and have to be added with the previous 158 159 partial products. Before every addition is performed the partial products are to be right shifted by 1 bit position 160 2?1+??3??312?1+??4??412?1 for n=1(6)?? = ??1??122?2+??2??222?2+??3??322?2 161 + ?? 4 ?? 42 2 ?2 for n=2 (7) ?? = ?? 1 ?? 13 2 ?3 + ?? 2 ?? 23 2 ?3 + ?? 3 ?? 33 2 ?3 + ?? 4 ?? 43 2 ?3 for 162 n=3(163

?2 ??? has 2 ?? possible values. The coefficients are fixed and hence 2 ?? combination of coefficients can be pre-computed and stored in a LUT (ROM). The LUT depth is 2 ?? , and width of LUT can be (L+1), where L is the maximum width of filter coefficients Fig. 10 shows the top level block diagram of DA algorithm. The

DA architecture consists of input registers, which are SISO registers that can be sequentially loaded with the 167 input samples. The limitations of DA architecture is that as the number of inputs increase from 8 to 16, the 168 size of LUT is 216. In order to reduce the LUT size, the inut sample accesses the bottom LUT. The size of top 169 and bottom LUT is 24, and thus the total size of the LUT is 32. As the LUTs are split into two sections, the 170 output of each LUT is independent and the accumulated data is further added to compute the final output. In 171 the split DA architecture shown in Fig. 11 In this work, 9/7 filter based DWT is chosen for modulation and 172 demodulation. Table 1 shows the 9/7 filter coefficients. As there exist symmetry in the 9/7 filter coefficients, the 173 modified equations for high pass and low pass filters can be expressed as follows: From the Table 1, as there are 9 174 low pass filter coefficients, and 7 high pass filter coefficients, the output samples can be expressed as in equation 175 (10) and equation (??1) respectively.?? ?? = ?? 0 ? 0 + ?? 1 ? 1 + ?? 2 ? 2 + ?? 3 ? 3 + ?? 4 ? 4 + ?? 5 ? 5 ? 5 176  $+?? 6? 6+?? 7? 7+?? 8? 8(10) ?? ?? =?? 0 \delta??" \delta??" 0+?? 1 \delta??" \delta??" 1+?? 2 \delta??" \delta??" 2+??$ 177  $3 \eth ??"\eth ??" \Im + ?? 4 \eth ??"\eth ??" 4 + ?? 5 \eth ??"\eth ??" 5 + ?? 6 \eth ??"\eth ??" 6 (11)$ 178

In order to realize the low pass and high pass filters using DA logic the depth of low pass LUT will be 2 9 and high pass LUT will be 2 7. In order to optimize the size of LUT, the symmetric property of filter coefficients are considered and the equation (10) and equation (11) are rewritten as in equation (12) and equation (13),?? ?? = ?? 0 ? 0 + (?? 1 + ?? 8)? 1 + (?? 2 + ?? 7)? 2 + (?? 3 + ?? 6)? 3 + (?? 4 + ?? 5)? 4 (12)

 $?? ?? = ?? 0 \delta ??" \delta ??"" \delta ??" \delta ??" \delta ??" \delta ??" \delta ??" \delta$ 183 )ð ??"ð ??" 3 (13) Thus in order to realize the filter the low pass LUT depth is 2 5 and high pass LUT depth is 184 2.4. Thus the total depth of LUT for DWT computation is (2.5 + 2.4) compared to the original LUT depth 185 of (29 + 27). Thus the memory size is reduced by 97.5%. However, it is observed that the number of adders 186 required is 5 and 4 for the low pass and high pass filter respectively. In this research work, one of the major 187 contributions is the design of DA architecture that combines split DA logic with symmetric property of filters. 188 The modified DA architecture is shown in Fig. 12. In the modified DA logic, the input samples are sequentially 189 loaded into the SISO registers, it requires 8\*9 clock cycles (the width of input samples are considered to be 8 bit 190 wide), after the initial load operations are performed, the input samples are added using the first stage adders 191 and the out of the adder is stored in the second stage PISO register, the addition and loading of second stage 192 PISO register requires one clock cycle. The PISO registers in the second stage are split into two halves, and are 193 further used in accessing the LUTs. As two PISO registers accesses one LUT, the LUT depth is 4. The bottom 194 LUT is accessed by 3 PISO registers, and thus the depth is 8. The total LUT size (depth) is 12 ??8 + 4). The 195 output of each LUT is accumulated to compute the final output of the low pass filter used in DWT. Thus the 196 number of adders required for low pass output filter computation is 7 and the LUT depth is 12. Similarly the 197 architecture for high pass filter using modified DA logic can be designed. Fig. 13 shows the modified DA logic 198 for high pass filter used in DWT. The depth of LUT is 8 (4 + 4), and the number of adders required are 6. The 199 PISO registers are used in accessing the LUTs and thus it requires 9 clock cycles (the width of input samples is 200 8 bit, after addition the width of each sample is 8+1 bit, thus 9 clock cycles are required to access the LUTs). 201 Thus the first output from low pass filter is available at  $9^{*}8 + 1 + 9$  clock cycles and the first output from high 202 pass filter is available at  $7^*8 + 1 + 9$  clock cycles. Hence the latency is 82 and 66 clock cycles respectively. The 203 first stage and second stage adders are isolated with the use of SISO and PISO registers, thus the addition in 204 the first stage and the accumulation in the second stage can be performed simultaneously. Thus the loading of 205 SISO register can be done in parallel, thus reducing one clock cycle, and the throughput in low pass and high 206 pass filter output computation is found to be 9 and 9 clock cycles respectively. Table 2 shows the comparison 207 of various DA algorithms for DWT computation. From the Table ??, it is found that the proposed DA logic 208 reduces the LUT size from 512 to 12 in low pass and 128 to 8 in high pass filter computation. The number of 209 adders is increased; however, the throughput is 9 for both low pass and high pass computation. The proposed 210 architecture is modeled using Verilog HDL and is verified for its functionality using suitable test cases. A test 211 environment is developed to test the logic correctness of the proposed DA logic. From the simulation results 212 obtained in ModelSim, the developed HDL model is found to produce correct results for all possible test vectors. 213 The proposed model is implemented using Xilinx ISE and is targeted on Virtex devices. The implementation 214 results are discussed in detail in next chapter. Another approach for DWT computation is using multiplexers 215 based approach. Next section discusses the multiplexers based approach with DA for DWT computation. 216

# <sup>217</sup> 5 b) Multiplexer based DA for DWT

The split DA logic discussed in the previous section uses two LUT structure to store the precomputed partial 218 products, which are accessed by the The modified DA logic based architecture is optimized for area and speed 219 220 performances, however, when the design is implemented on FPGA, there are limitations. FPGA consists of 221 Configurable Logic Blocks (CLBs), dedicated RAM (block RAM), dedicated multipliers and routing resources. 222 CLB consists of LUTs, registers (flip flop), multiplexers, fast carry adders and buffers. As the modified DA logic 223 uses LUTs and adders, the multiplexer logic and registers are not utilized within a CLB. Thus for implementation of modified DA logic more number of CLBs is utilized and every CLB resource is not completely utilized. Hence 224 in order to utilize the resources fully within a CLB, a novel FIR filter architecture is proposed and implemented.?? 225 ?? = ? ?? ?? ?? ?? ???1 ??=0 226

N=9 (or) 7 for DWT filters. (??4) The above equation can be expanded and written as equation (15),?? ?? =? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 8 ??=5 4 ??=0 (15) Equation (??5) can be realized using DA algorithm and mux based logic in order to fully utilize the FPGA resources. Equation (??5) consists of two terms, the first term is realized using mux based logic and the second term is realized using split DA logic. The Term ? ?? ?? ?? 8 ??=5

is realized using split DA logic and is as shown in Fig. 14. , Expanding the term equation (??6) is obtained,??
??1 = ?? 0 ?? 0 + ?? 1 ?? 1 + ?? 2 ?? 2 + ?? 3 ?? 3 + ?? 4 ?? 3 (16)

As the filter parameters (H) are fixed coefficients, and ?? ?? being binary number the term ?? 0 ?? 0 can be 234 expressed as  $?? \ 0 ?? \ 0 = ?? \ 0 \ [?? \ 0 \ 7 \ ?? \ 0 \ 6 \ ?? \ 5 \ 5 \ ?? \ 0 \ 4 \ ?? \ 0 \ 3 \ ?? \ 0 \ 2 \ ?? \ 0 \ 1 \ ?? \ 0 \ 0 \ ]$ , where  $?? \ 0 \ 7$  is the MSB 235 and ?? 0 0 is LSB. Multiplication of ?? 0 ?? 0 is performed by checking individual bits of?? 0 , if ?? 0 0 is '1' 236 then ?? 0 is the first partial product else if ?? 0 0 is '0' then the partial product is all zeros. Similarly every bit 237 of X 0 is checked for its weight and the H 0 coefficient is added with the previous bit partial product. Prior to 238 addition ?? 0 ?? 0 0 partial product should be shifted right by 1 bit and added with ?? 0 1 ?? 0 partial product. 239 In order to realize equation (??6) using multiplexer, as there are five terms, five 2:1 multiplexers are required. 240 One input of the multiplexers is the filter coefficient ?? 0 ?? 1 ?? 2 ?? 3 ?? 4 and the other input is all zeros. 241 The ?? 0 ?? bit forms the select line of the multiplexer. If ?? 0 ?? bit is 1 then the output of mux is zero else the 242 output of the mux is the corresponding filter coefficient. After every output is chosen from the mux for every bit 243 of input sample, the outputs are accumulated and the final product is computed. Fig. 16 shows the mux based 244 245 filter design for the first term of equation (15). The use of multiplexers and adders in computing the filter output 246 eliminates the use of LUTs and hence at the input of every multiplexer two registers are required one stores 247 the filter coefficient and the other is hardwired to ground as shown in Fig. 15. The multiplexer based logic is combined with split DA logic in computation of low pass filter outputs. 5 presents the performance parameters of 248 the novel DWT architecture designed using mux with split DA logic. The advantage of novel algorithm for DWT 249 computation is that it fully utilizes the CLB resources and hence the area occupancy on FPGA is optimized. As 250 the filter coefficients are biorthogonal, the IDWT processor can be realized just by interchanging the high pass 251 and low pass filters used for DWT computation. The designed 1D DWT architecture is used to compute 2D 252 DWT for the input image. The top level architecture for 2D DWT processor is implemented using the modified 253 1D-DWT architecture discussed in Fig. 12 and Fig. 13. The 2D-DWT processor consists of input memory, 254 output memory and three 1D-DWT processors as shown in Fig. 18. 255

HDL code for 1D DWT processor, input memory and output memory is developed and are integrated to top 256 module. The top module is verified using test bench written in Verilog and with know set of input vectors. The 257 simulation results and synthesis results are obtained using Xilinx ISE. The synthesis results obtained are verified 258 with various constraints options provided in the tool. The default options were producing best results. The 259 area report in terms of slices, the power report and timing report have been generated and are reported in this 260 work. Conventional DWT architecture was realized in [19] on Spartan device hence the results reported have 261 been used for comparison. In order to compare the performance improvements in the proposed architecture, the 262 conventional DWT architecture is modeled using HDL and implemented on Virtex-5 device. The results obtained 263 are reported in table 6. From the comparison results it is demonstrated that the proposed architecture consumes 264 very less resources, as the multipliers are replaced with shift operations, the operating frequency is increased to 265 268 MHz and power dissipation is reduced by setting the low power constraints. One of the major challenges 266 in the design is data synchronization in DWT computing, as the shift operations are used for multiplication 267 operation, it is mandatory to carefully design the control unit to keep track of the data output and read the data 268 into register for further computation and hence there is need for a predesigned control logic to monitor the data 269 flow logic. 270

271 V.

#### 272 6 Conclusion

Use of NN for image compression has superior advantage compared with classical techniques, however the NN 273 architecture requires image to be decomposed to several blocks of each 8 x 8, and hence introduces blocking 274 artifact errors and checker box errors in the reconstructed image. In order to overcome the checker errors in this 275 work, we have used DWT for image decomposition prior to image compression using NN architecture. In this 276 work, we proposed a hybrid architecture that combines NN with DWT and the input image is used to train the 277 278 network. The network architecture is used to compress and decompress several images and it is proven to achieve 279 better MSE compared with reference design. The hybrid technique uses hidden layer consisting of tansig function 280 and output layer with purelin function to achieve better MSE. In order to reduce the computation complexity 281 of DWT architecture in this work two different architectures for DWT computation is proposed, designed and implemented on FPGA. The modified DA algorithm and the Multiplexer based DA algorithm is designed to 282 reduce the number of logic gates and to improve throughput on FPGA platform. The 2D DWT architecture is 283 designed with the proposed 1D DWT architecture and the design is implemented on FPGA that operates at a 284 maximum speed of 268 MHz with power consumption less than 1W. The proposed design can be integrated with 285 NN architecture for hybrid architecture for image compression. 286



Figure 1: Figure 1:



Figure 2: Figure 2 :



Figure 3: Figure 3 :



Figure 4: Figure 4 : FFigure 5 :



Figure 5: Figure 6 :



Figure 6: Figure 7 :



Figure 7: Figure 9:

|   | LL | LH | LL | LH | LL | LH | LL | LH |
|---|----|----|----|----|----|----|----|----|
|   | HL | нн | HL | нн | HL | нн | HL | нн |
|   | LL | LH | LL | LH | LL | LH | LL | LH |
|   | HL | нн | HL | нн | HL | нн | HL | нн |
|   | LL | LH | LL | LH | LL | LH | LL | LH |
|   | HL | нн | HL | нн | HL | нн | HL | нн |
|   | LL | LH | LL | LH | LL | LH | LL | LH |
| 8 | HL | нн | HL | нн | HL | нн | HL | нн |
| 0 |    |    |    |    |    |    |    |    |





 $\mathbf{10}$ 

Figure 9: Figure 10 :







Figure 11: Figure 12 :



Figure 12: Figure 13 :



Figure 13: ModifiedF



Figure 14: Figure 14 :



Figure 15: Figure 15 : FFigure 16 :



Figure 16:



Figure 17: Figure 17 : Figure 18 :



Figure 18: Modified





1

| Order | Co-efficient | values |
|-------|--------------|--------|
| 3     | 0.0912717    | 47     |
| 2     | -0.0575435   | 29     |
| 1     | -0.5912717   | -303   |
| 0     | 1.11150870   | 569    |
| 6     | -0.5912717   | -303   |
| 5     | -0.0575435   | 29     |
| 4     | 0.0912717    | 47     |



 $\mathbf{2}$ 

|            |              | High Pass    |           |          | Low Pass     |          |
|------------|--------------|--------------|-----------|----------|--------------|----------|
|            | DA           | Split DA     | Modified  | DA       | Split DA     | Modified |
|            |              |              | DA        |          |              | DA       |
| LUT Size   | $2\ 7 = 128$ | 24 + 23 = 24 | 8         | 2 9      | 25 + 24      | 12       |
| Latency    | 8*9+8        | 8*7+8        | 7*8+1+9=6 | 668*9+8  | 8*9+8        | 9*8+1+9  |
| Throughput | t 16         | 16           | 9         | 16       | 16           | 9        |
| Adders     | 1            | 3            | 6         | 1        | 3            | 7        |
| SISO       | Required     | Required     | Required  | Required | Required     | Required |
| PISO       | Not          | Not required | Required  | Not re-  | Not required | Required |
|            | required     |              |           | quired   |              |          |

Figure 21: Table 2 :

| Performance            | Low pass filter High pass |                 |  |  |
|------------------------|---------------------------|-----------------|--|--|
| Parameters             |                           | filter          |  |  |
| LUT size               | 2 LUTs of size            | 2  LUTs of size |  |  |
|                        | 4x9                       | 4x9             |  |  |
| Number of multiplexers | 4                         | 3               |  |  |
| Number of adders       | 5                         | 3               |  |  |
| Number of              | 6                         | 4               |  |  |
| accumulators           |                           |                 |  |  |
| Throughput             | 17                        | 17              |  |  |
| Latency                | 9*8 + 8 + 1               | $7^*8 + 8 + 1$  |  |  |
| CLB utilization        | 100%                      | 100%            |  |  |

Figure 22: Table 4 :

## $\mathbf{5}$

| Parameters                       | Conventional<br>DWT [19] (on<br>Spartan) | Conventional DWT                                                               | Proposed Design                                                               |
|----------------------------------|------------------------------------------|--------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| No of Slices                     | 566 out of 768                           | $\begin{array}{rrrr} 31105 & {\rm out} & {\rm of} & 69120 \\ 45\% \end{array}$ | $\begin{array}{rrrr} 7235 & {\rm out} & {\rm of} & 69120 \\ 12\% \end{array}$ |
| No of gates                      | 37K                                      | $\begin{array}{rrrr} 31105 & {\rm out} & {\rm of} & 69120 \\ 45\% \end{array}$ | $\begin{array}{rrrr} 7235 & {\rm out} & {\rm of} & 69120 \\ 42\% \end{array}$ |
| Clock Speed<br>Power dissipation | $36 \mathrm{MHZ}$<br>$51 \mathrm{mW}$    | 237 MHz<br>1.37 W                                                              | 268 MHz<br>0.9 W                                                              |

Figure 23: Table 5 :

# $\mathbf{4}$

#### Global $\mathbf{7}$ 287 $1 \ 2$ 288

 $<sup>^1 @</sup>$  2014 Global Journals Inc. (US)  $^2 @$  2014 Global Journals Inc. (US) @ 2014 Global Journals Inc. (US)

- [Zhang et al. ()] 'A Pipeline VLSI Architecture for Fast Computation of the 2-D Discrete Wavelet Transform'.
   Chengjun Zhang , Chunyan Wang , M Ahmad . *IEEE Trans. on Circuits and Systems* 2012. 59 p. .
- [Liying and Khashayar ()] 'Adaptive Constructive Neural Networks Using Hermite Polynomials for Image
   Compression'. M Liying , K Khashayar . Lecture Notes in Computer Science 2005. Springer-Verlag. 3497
   p. .
- [Lai and Chang ()] 'Adaptive Data Hiding for Images Based on Haar Discrete Wavelet Transform'. Bo-Luen Lai
   , Long-Wen Chang . Lecture Notes in Computer Science 2006. Springer-Verlag. 4319 p. .
- [Lekutai ()] Adaptive Self-tuning Neuro Wavelet Network Controllers, G Lekutai . 1997. Blacksburg-Virgina,
   Mars. (Thèse de Doctorat)
- [Vilovic ()] 'An Experience in Image Compression Using Neural Networks'. Vilovic . 48th International Symposium ELMAR-2006 focused on Multimedia Signal Processing and Communications, 2006. IEEE. p. .
- [Minasyan et al. ()] 'An Image Compression Scheme Based on Parametric Haar-like Transform'. S Minasyan, J
   Astola, D Guevorkian. ISCAS 2005. IEEE International Symposium on Circuits and Systems, 2005. p. .
- <sup>302</sup> [D'souza Winston and Spracklen ()] 'Application of Artificial Neural Networks for real time Data Compression'.
   <sup>303</sup> A D'souza Winston , Tim Spracklen . 8th International Conference On Novembre, 2001.
- [Ch et al. (1998)] Ch, S Bernard, J-J Mallat, Slotine. Wavelet Interpolation Networks, International Workshop
   on CAGD and wavelet methods for Reconstructing Functions, (Montecatini) Juin 1998. p. .
- [Foucher and Vaucher ()] Compression d'images et réseaux de neurones, revue Valgo n°01-02, C Foucher , G
   Vaucher . 17-19 octobre 2001. Ardèche.
- Baron ()] Contribution à l'étude des réseaux d'ondelettes, R Baron . 1997. Février. Ecole Normale Supérieure
   de Lyon (Thèse de doctorat)
- 310 [Grossmann and Torrésani ()] A Grossmann , B Torrésani . Les ondelettes, Encyclopedia Universalis, 1998.
- [Talukder and Harada ()] 'Haar Wavelet Based Approach for Image Compression and Quality Assessment of
   Compressed Image'. K H Talukder , K Harada . *IAENG International Journal of Applied Mathematics* 2007.
- [Sang Yoon Park et al.] 'High-Throughput, and Low-Area Adaptive FIR Filter Based on Distributed Arithmetic'.
   Pramod Kumar Sang Yoon Park , Meher , Low-Power . *IEEE TRANSACTIONS ON CIRCUITS AND* SYSTEMS-II: EXPRESS BRIEFS p. .
- [Jiang ()] 'Image compressing with neural networks A survey'. J Jiang . Signal processing: Image communication,
   1999. ELSEVIER. 14 p. .
- [Cierniak ()] 'Image Compression Algorithm Based on Soft Computing Techniques'. R Cierniak . Lecture Notes
   in Computer Science 2004. Springer-Verlag. 3019 p. .
- 320 [Kulkarni et al. ()] 'Image Compression Using a Direct Solution Method Based Neural Network'. S Kulkarni , B
- Verma, M Blumenstein. The Tenth Australian Joint Conference on Artificial Intelligence, (Perth, Australia)
   1997. p. .
- <sup>323</sup> [Osowski et al. ()] 'Image compression using feed forward neural networks -Hierarchical approach'. S Osowski ,
   <sup>324</sup> R Waszczuk , P Bojarczak . Lecture Notes in Computer Science 2006. Springer-Verlag. 3497 p. .
- [Northan and Dony ()] 'Image Compression with a multiresolution neural network'. B Northan , R D Dony .
   *Canadian Journal of Electrical and Computer Engineering* 2006. 31 (1) p. .
- <sup>327</sup> [Veisi and Jamzad ()] 'Image Compression with Neural Networks Using Complexity Level of Images'. S Veisi ,
   <sup>328</sup> M Jamzad . Proceedings of the 5th International 16 Mai 2003. Symposium on image and Signal Processing
   <sup>329</sup> and Analysis, (the 5th International 16 Mai 2003. Symposium on image and Signal Processing and Analysis)
   <sup>330</sup> 2007. IEEE. 07 p. .
- 331 [Ye et al. ()] 'Information Measures for Biometric Identification via 2D Discrete Modified Distributive Arithmetic
- 332 based 2d-Dwt for Hybrid (Neural Network-Dwt) Image Compression Wavelet Transform'. Z Ye , H
- Mohamadian, Y Ye. Proceedings of the 3rd Annual IEEE Conference on Automation Science and Engineering,
- (the 3rd Annual IEEE Conference on Automation Science and Engineering) 2007, 2007. p. .
- [Ratakonda and Ahuja ()] 'Lossless Image Compression with Multiscale Segmentation'. K Ratakonda , N Ahuja
   *IEEE Transactions Image Processing*, 2002. 11 p. .
- [Dony and Haykin ()] 'Neural network approaches to image compression'. R D Dony , S Haykin . Proceedings of
   the IEEE, V83, N°2, Février, (the IEEE, V83, N°2, Février) 1995. p. .
- 339 [Neural Processing] Neural Processing, (Shanghai, Chine) p. .
- [Charalampidis] 'Novel Adaptive Image Compression'. D Charalampidis . Workshop on Information and Systems
   *Technology*, 101. TRAC Building, University of New Orleans
- [Cyril Prasanna ()] 'Review of 2D VLSI architectures for image Compression'. Raj P Cyril Prasanna . SASTech
   Journal 2006. 2 (4) p. .

#### 7 GLOBAL

- [Nadenau et al. ()] 'Wavelet Based Color Image Compression: Exploiting the Contrast Sensitivity Function'. M
   J Nadenau , J Reichel , M Kunt . *EEE Transactions Image Processing*, 2003. 12 p. .
- 346 [Zang ()] 'Wavelet Network in Nonparametric Estimation'. Zang . IEEE Trans. Neural Networks 1997. 8 (2) p. .
- 347 [Zang and Benveniste ()] 'Wavelet networks'. Q Zang , A Benveniste . IEEE Trans. Neural Networks 1992. 3 p. .