# Design and Implementation of a Relaxed Haar Discreet Wavelet Transform Hardware module for Multimedia Compression

James Ntaganda

Electronic Engineering Department, Konkuk University, South Korea

**Abstract:** Still Images and video clips are common formats for storing and sharing information in different fields. Information content in such multimedia formats is huge and requires compression. Compression is done to reduce redundant information content while maintaining good perceptual video or Image quality for Human Visual Systems (HVS) and involves signal transformation together with other operations. While MPEG-X and H.26X deploy DCT (Discreet Cosine Transform), Discreet Wavelet Transforms (DWTs) are common in different multimedia CODECs such as MJPEG2000, MJPEG-XR and MJPEG-LS. DWT like all other transformations requires intensive computation and consume more power. To mitigate the trade-off between computation duration, bulkiness of subsystems and power consumption, Software-Hardware co-design approach is used. In this approach, Hardware modules called hardware accelerators are reserved for intensive functions and are called by the main program as a subroutine, thus a good flexible design approach is flexibility and simplicity to design a relaxed Hardware module that can be used to process a 16x16 spatial micro block from an Image or a video frame without multiplication because multiplications takes longer and hence more power consumption. The Module is Implemented on FPGA

Keywords-Discreet Wavelet Transform, Multimedia Compression, Hardware Realization, FPGA

### I. Introduction

The demand for information exchange and storage in Image and video data formats is expected to grow in a proportional relation to teledensity and internet connectivity. However, Video binary information content is enormous to handle in overloaded data networks and computation involved is too intensive especially for portable devices. To reduce information content carried with in images, standardization groups recommend different specific methods and techniques in video compression for flexible compatibility across different platforms. Compression is done while keeping a good perceptual video quality for Human Visual Systems (HVS). High Efficiency Video Coding (HEVC) is a new Standard for video compression developed by the ISO and ITU-T [12]. Despite the success of MPEG-X and H.26X CODECs, Jason Spielfogel [4] points out that they continue to show drawbacks due to spatial-temporal compression involved in some specific scenes and applications. These drawbacks include [4]:

- (a) In scenes with high movement, temporal compression can create "artifacts" or remnants of previous Reference Frames, until the next Index frame (I-Frame) refreshes the scene.
- (b) If the entire scene is moving as with a PTZ camera, or if there is substantial movement within the scene of a fixed camera, then the Reference Frames can be at or near the same size as the I-Frame.
- (c) Because reference frames only measure the changes that have occurred since the Index frame, temporally coded video bitrates are not easily predictable in case of rapidly changing scenes.

Contrary, Motion JPEG (MJPEG) also known as Images per second coding or frame-per-frame compression techniques deliver a predictable file size making it easier to predict bandwidth. MJPEG2000 and latest MJPEG CODECs adopted DWT instead of Discrete Cosine Transform (DCT), thus blocking effect can be avoided [4]. Furthermore latest MJPEG features new functions such as progressive image transmission by quality or resolution, lossy and lossless compressions, region of interest (ROI) encoding, and good error resilience [9]. Due to these specific advantages, specific applications such as PTZ (Pan-Tilt-Zoom) cameras and IP based surveillance cameras still deploy MJPEG CODECs [4].In their paper, Dragomir El Mezeni et al [5], describe in details JPEG-XR architecture presented in fig 1. Similarly, Christian Perra [6] suggested two architectures of MJPEG2000 and MPEG-XR, both of which deploy DWT at transformation stage. In this paper we only focus on developing a flexible hardware module for HDWT that can be integrated in Hardware-Software designs. The rest of the paper is organised as follows: Part I gives background and necessity for this work. In Parts II and III, we used Haar function to develop orthogonal Haar basis. In part IV we develop orthogonal basis which form the coefficients of a 16-element array after transformation. In Part V, we show the show image decomposition capability of HDWT. VI presents the design methodology. In part VII, by using

VHDL and synthesizing tools we simulate our design and present different views of the module designed. Place and routing is done on cyclone-IV FPGA module. In Part VIII, we present our Conclusive remarks.



Figure1: JPEG XR coding flow, encoder side [6]

### II. Discreet Wavelet Transforms

Wavelets are mathematical tools for hierarchically decomposing functions [2]. They allow a function to be described in terms of a coarse overall shape, plus details that range from broad to narrow [2]. Regardless of whether the function of interest is an image, a curve, or a surface, Wavelets offer an elegant technique for representing the levels of detail present [2]. For any Image of NxN dimensions, each of its rows and columns can be represented as a linear combination of averaged and detailed coefficients (matrix entries) A. Jensen and A. Ia [3] presents thoroughly Ripples in Mathematics, Discrete Wavelet Transform and their hierarchical decompositions. The whole process is done by prediction and updating the new coefficients [3].

### III. Haar Function, Its Scaling And Shifting Nature

Haar function has existed for decades and John E. Shore [11] in his concise paper presented the best concept on Haar function. In this paper we exploit the flexibility and simplicity of the scaling functions in equation (1) and (2), to generate orthogonal basis for Haar wavelets in a multi-resolution analysis (MRA) window.



Using equations (1), (2) by scaling and shifting, MATLAB<sup>™</sup> simulations, generates Haar wavelet MRA (Multi-resolution Analysis) graphical representation in figure 1. They represent orthogonal basis.



Figure 2: Simulation of Scaling and shiting of Haar function in an MRA window

#### IV. Haar Wavelet Transform Function And Basis

Images can be represented in form of array vectors to store respective rows and columns of an NxM resolution. In situations where N=M, the whole image is stored in a square matrix representation. If we represent a row or a column of image is a vector  $I \in R^{16} = [S_1 S_2 S_3 S_4 S_5 S_6 S_7 S_8 S_9 S_{10} S_{11} S_{12} S_{13} S_{14} S_{15} S_{16}]$ . We can represent I (x) in terms of Haar wavelet [13] coefficients by equation (3)

$$I(x) = \sum_{i,j \in \mathbb{Z}} < I, \psi_{i,j} > \psi_{i,j}.....(3)$$

It then follows that, using equations (1) and (2),  $I \in \mathbb{R}^{16}$  can be represented by equation (4)

$$I(x) = I_1 \varphi_{0,0} + I_2 \psi_{0,0} + I_3 \psi_{1,0} + I_4 \psi_{1,1} + I_5 \varphi_{2,0} + I_6 \psi_{2,1} + I_7 \psi_{3,0} + I_8 \psi_{3,1} + I_9 \psi_{4,0} + I_{10} \psi_{4,1} + I_{11} \psi_{5,0} + I_{12} \psi_{5,1} + I_{13} \psi_{6,0} + I_{14} \psi_{6,1} + I_{15} \psi_{7,0} + I_{16} \psi_{7,0}$$
(4)

Using the scaling functions and shifting operations, given in figure 2, we can obtain the wavelet components of  $I(x) \in \mathbb{R}^{16}$ . These coefficients,  $I_1$  to  $I_{16}$  are calculated and presented in equations (5) to (20), extended from [13].

$$I_{1} = \int_{0}^{1/6} s_{1} \phi_{0,0}(x) dx + \int_{1/6}^{2/16} s_{2} \phi_{0,0}(x) dx + \int_{1/6}^{3/16} s_{3} \phi_{0,0}(x) dx + \int_{3/16}^{4/16} s_{4} \phi_{0,0}(x) dx + \int_{4/16}^{3/16} s_{5} \phi_{0,0}(x) dx + \int_{4/16}^{5/16} s_{5} \phi_{0,0}(x) dx + \int_{1/6}^{5/16} s_{7} \phi_{0,0}(x) dx + \int_{1/6}^{3/16} s_{8} \phi_{0,0}(x) dx + \int_{3/16}^{5/16} s_{9} \phi_{0,0}(x) dx + \int_{3/16}^{5/16} s_{10} \phi_{0,0}(x) dx + \int_{3/16}^{5/16} s_{10} \phi_{0,0}(x) dx + \int_{1/16}^{5/16} s_{10} \phi_{1,0}(x) dx + \int_{1/16}^{5/16} s_{10} \phi_$$

$$I_{z} = \int_{0}^{10} s_{z} |\psi_{z_{z}}(x) dx + \int_{z_{z}}^{10} s_{z} |\psi_{z_{z}}(x) (x) dx + \int_{z_{z}}^{10} s_{z} |\psi_{z_{z}}(x) d$$

$$\begin{split} & I_{3} = \int_{10}^{10} s_{11} \psi_{4,0}(\mathbf{x}) d\mathbf{x} + \int_{11.6}^{21.6} s_{2} \psi_{4,0}(\mathbf{x}) (\mathbf{x}) d\mathbf{x} + \int_{21.6}^{21.6} s_{11} \psi_{4,0}(\mathbf{x}) d\mathbf{x} + \int_{11.6}^{21.6} s_{11} \psi_{4,1}(\mathbf{x}) d\mathbf{x} + \int_{11$$

$$I_{12} = \int_{0}^{148} s_{1}\psi_{4,0}(x)dx + \int_{126}^{128} s_{2}\psi_{4,0}(x)(x)dx + \int_{226}^{148} s_{1}\psi_{4,0}(x)dx + \int_{116}^{478} s_{4}\psi_{4,0}(x)dx + \int_{116}^{178} s_{1}\psi_{4,0}(x)dx + \int_{116}^{178} s_{1}\psi_{4,0}(x)dx + \int_{126}^{148} s_{1}\psi_{4,0}(x)dx + \int_{146}^{148} s_{1}\psi_{4,0}(x$$

#### V. Image Decomposition BY Haar Dwt

One of the powerful features of wavelet transformation is that image can be decomposed and compacted into energy levels. Using MATLAB<sup>TM</sup> software simulation, we present Image decomposition in both vertical hierarchies in figure 3 and in square energy compaction levels in figure 4. Horizontal, vertical and diagonal details can be seen along decomposition. Details coefficients help in image synthesis. With this energy compaction, it is easier for quantization process that comes next to transformation in most of Multimedia CODECs



Figure 3 details coefficients in vertical hierarchy at four levels



(a) Original Image



(b) Decomposition

Figure 4: Spine decomposition at four levels



(c) Image synthesis



Figure 5: Foot ball decomposition at four levels



(a) Original Image

(b) Decomposition

(c) Image synthesis

Figure 6: Autumn decomposition at four levels



(a) Original Image

(b) Decomposition

(c) Image synthesis

Figure 7: Office decomposition at four levels

#### VI. Hardware Design For Relaxed Haar Discreet Wavelet Module

Different approaches to DWT hardware implementation have been published. The main approach used is prediction and updating [1] but the memory required during respective intermediate stages is immense. Different papers also have shown lifting and techniques which involves Filters. Coefficients involved in this approach makes precision problem. In this paper we revisit the conventional DCT, FFT and DST basic processing element [10] of two inputs and two outputs in figure 8. The upper output involves addition only while the lower involves subtraction. This simple structure becomes handy in the whole design process. By using equations (5) to (20) and by re-arrange the inputs starting with even elements  $(S_0, S_2, S_4...)$  of a 16element array and then odd elements  $(S_1, S_3, S_5...)$ , in section IV, an elegant, simple and flexible structure for hardware implementation is presented in figure 9. The dotted paths in figure 9 indicate signal routes to output but no operation involved. Instead of multiplication matrices, we use bit shifting operations in hardware description languages (HDLs).



Figure 8 : Basic processing unit [10]



Figure 9: Butterfly structure for HDWT designed from equations (5) to (20)

## VII. Design SYNTHESIS, Placing AND Routing

Using Hardware description language (HDL), structure presented in figure, we used ModelSim® and Quartus software to realize the design presented. Figure 10 shows ModelSim® wire connectivity and binary simulations involve using different test vectors. Similarly figure 11 shows RTL view with the help of Quartus® software after place, routing as and synthesis. It should be noted that the input and output have equal bit length (8 bits). This is because during shifting process (dividing by 16) we shifted rightwards by 4bit position. The output became 8bits instead of 12 bits after four successful stages that involved addition and subtraction.



Figure 10: Dataflow tracing using ModelSim®

|             |  |           |          |   |              | stage fourt 14 |   |                |           |   |               |          |        |  |
|-------------|--|-----------|----------|---|--------------|----------------|---|----------------|-----------|---|---------------|----------|--------|--|
|             |  |           |          |   | stage_two:U2 |                |   | stage three:U3 |           |   | stage_tout.04 |          |        |  |
|             |  |           |          |   |              |                |   |                |           | 4 | - clk         | 1.077.07 | - 0.40 |  |
|             |  | stage     | e_one:U1 |   |              |                | 4 | CIK            |           |   |               | ir0[70]  | Outo   |  |
|             |  | -11       |          | 1 | CIK          |                |   | nrst           | 2r0[100]  |   | Zr0[100]      | IF1[70]  | Juno   |  |
|             |  | CIK       |          |   | nrst         | yrujauj        |   | yruj90j        | Zr1[100]  |   | Zr1[100]      | IF2[70]  | Out2[  |  |
| in OTT, OIL |  | nrst      | xr0[80]  |   | xr0[80]      | yr1[90]        |   | yr11901        | Zr2[100]  |   | Zr2[100]      | IF3[70]  | Outa   |  |
| in0[70]     |  | Sr0[70]   | xr1[80]  |   | xr1[80]      | yr2[90]        |   | yr2[90]        | Zr3[100]  |   | Zr3[100]      | ir4[70]  | Out4[  |  |
| In1[70]     |  | sr1[70]   | xr2[80]  |   | xr2[80]      | yr3[90]        |   | yr3[90]        | zr4[100]  |   | zr4[100]      | ir5[70]  |        |  |
| In2[70]     |  | sr2[70]   | xr3[80]  |   | xr3[80]      | yr4[90]        |   | yr4[90]        | zr5[100]  |   | zr5[100]      | in6[70]  |        |  |
| in3[70]     |  | sr3[70]   | xr4[80]  |   | xr4[80]      | yr5[90]        |   | yr5[90]        | zr6[100]  |   | zr6[100]      | ir7[70]  | Out/[  |  |
| in4[70]     |  | sr4[70]   | xr5[80]  |   | xr5[80]      | yr6[90]        |   | yr6[90]        | zr7[100]  |   | zr7[100]      | ir8[70]  | Out8[  |  |
| in5[70]     |  | sr5[70]   | xr6[80]  |   | xr6[80]      | yr7[90]        |   | yr7[90]        | zr8[100]  |   | zr8[100]      | ir9[70]  | Out9[  |  |
| in6[70]     |  | sr6[70]   | xr7[80]  |   | xr7[80]      | yr8[90]        |   | yr8[90]        | zr9[100]  |   | zr9[100]      | ir10[70] | Out10  |  |
| in7[70]     |  | sr7[70]   | xr8[80]  |   | xr8[80]      | yr9[90]        |   | yr9[90]        | zr10[100] |   | zr10[100]     | ir11[70] | Out11  |  |
| in8[70]     |  | sr8[70]   | xr9[80]  |   | xr9[80]      | yr10[90]       |   | yr10[90]       | zr11[100] |   | zr11[100]     | ir12[70] | Out12  |  |
| in9[70]     |  | sr9[70]   | xr10[80] |   | xr10[80]     | yr11[90]       |   | yr11[90]       | zr12[100] |   | zr12[100]     | ir13[70] | Out13  |  |
| n10[70]     |  | sr10[70]  | xr11[80] |   | xr11[80]     | yr12[90]       |   | yr12[90]       | zr13[100] |   | zr13[100]     | ir14[70] | Out14  |  |
| n11[70]     |  | sr11[70]  | xr12[80] |   | xr12[80]     | yr13[90]       |   | yr13[90]       | zr14[100] |   | zr14[100]     | ir15[70] | Out15  |  |
| n12[70]     |  | sr12[70]  | xr13[80] |   | xr13[80]     | yr14[90]       |   | yr14[90]       | zr15[100] |   | zr15[100]     |          |        |  |
| n13[70]     |  | sr13[70]  | xr14[80] |   | xr14[80]     | yr15[90]       |   | yr15[90]       |           |   |               |          |        |  |
| n14[70]     |  | sr14[70]  | xr15[80] |   | xr15[80]     |                |   |                |           |   |               |          |        |  |
| n15[7_0]    |  | sr15[7_0] |          |   |              |                |   |                |           |   |               |          |        |  |









Figure 13: Cyclone IV E-EPCE115F29C7 wired bond after place and routing to the FPGA

#### VIII. Conclusion

In this paper, we explored the basic principles of Haar Function and exploited the simplicity of Haar wavelet transformation to construct a butterfly circuit structure with zero multiplication operation. With VHDL, we simulated, placed and routed our design to FPGA. We implemented the design on Cyclone IV E-EPCE115F29C7 module. Using this approach, CODECs using DWT at their signal transformation stage, can integrate such module into their Hardware-Software system with ease.

#### References

- [1]. Chih-Hsien Hsia, et al, Memory-efficient architecture of 2-D lifting-based discrete wavelet transform, Journal of the Chinese Institute of Engineers, Published online: 28 Jun 2011.
- [2]. Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for computer graphics: A primer, part 1. *IEEE Computer Graphics and Applications*, 15(3):76–84, May 1995.
- [3]. A. Jensen and A. la Cour-Harbo, Ripples in Mathematics, the Discrete Wavelet Transform Springer-Verlag 2001.
- [4]. Jason Spielfogel, *Why* we like MJPEG compression, <u>http://www.securityinfowatch.com/article/10561410/why-we-like-mjpeg-compression</u>[Accessed on 4<sup>th</sup> may, 2015].
- [5]. Dragomir El Mezeni et el , JPEG-XR encoder implementation on a heterogeneous multiprocessor system, 5th European Conference on Circuits and Systems for Communications (ECCSC'10), November 23–25, 2010, Belgrade, Serbia
- [6]. Cristian Perra, Re-encoding JPEG images for smart phone applications, 21st Telecommunications forum TELFOR 2013, IEE, 2013
   [7]. Koichi Hattori, Hiroshi Tsutsui et al, A High-Throughput Pipelined Architecture for JPEG XR Encoding, IEEE Conference Publications, IEEE Conference Publications, 2009. .DOI: 10.1109/ESTMED.2009.5336818
- [8]. Harish Yagain, Srinivas Donapati, Addressing the Interoperability Issues While Using JPEG-XR, 2011 International Symposium on Electronic System Design
- [9]. C. A. Christopoulos T. Ebrahimi and A. N. Skodras, JPEG2000: The New Still Picture Compression Standard, Media Lab, Ericsson Research, Ericsson Radio Systems AB, S-16480 Stockholm, Sweden
- [10]. Saini, S. ; Mahajan, A. ; Mandalika, Implementation of Low Power FFT Structure using a Method Based on Conditionally Coded Blocks, IEEE Conference Publications 2010,
- JOHN E. SHORE, On the Application of Haar Functions, Concise Papers, IEEE TRANSACTIONS ON COMMUNICATIONS, MARCH 1973
- [12]. Iain Richardson, The new standard for video coding released by ISO MPEG and ITU-T VCEG <u>http://www.vcodex.com/h265.html</u> [accessed may 6<sup>th</sup> 2015]
- [13]. Zunera Idrees, Eliza Hashemiaghjekandi, Image Compression by Using Haar Wavelet Transform and Singular Value Decomposition, School of computer Science, Physics and mathematics Linnaeus University.