‎"Behind every stack of books there is a flood of knowledge."

Design an image compression and decompression system



The amount of information of picture on digital form is quite large compared with commonly available bandwidth to transmit them over the internet or wireless communication. Another application like video requires thousand of pictures, which need a lot of space for storing them. Therefore, we need a way for reducing the storage requirement for image data. One of the common method is the image compression.

The image compression system is able to reduce image data size without significantly reducing image quality. It removes the least information on the image which considered as high frequency. The human eye is more difficult to see the difference in high frequency than low frequency. Moreover, most of digital image is low frequency. By eliminating the higher frequency we can significantly reduce the image data size. This elimination is considered as lossy compression technique. It means that the compressed image is not exactly same as the original image.

The objective of this contest is to design the image compression and decompression system using HDL (VHDL or Verilog HDL) and to synthesize digital circuits using Synopsys design analyzer or any other EDA tools. Making FPGA is also optional but our judges love to see your FPGA designs.

Figure 1 shows the block diagram of proposed image compression system. An object is captured by a camera in YCrCb format (Y: luminance, Cb: Chrominance-Blue, and Cr: Chrominance-Red). The Compression Module will compress the image into bitstream data. The compressed bitstream data will be ready to be transmitted or stored.

At the receiver side, the birstream data will be decompressed into original image by the Decompression Module. The decompressed image is in YCrCb format, and ready to be displayed in Monitor.


In this section, we will explain detail block diagram of the compression and decompression module.


The Compression Module consists of the DCT Transform and the Quantizer block as can be seen in Figure 2. In this module, the input image data will be processed by 8×8 pixels. Therefore 8 points 2-D DCT transform is required. The size of this 8×8 pixels was determined by JPEG and MPEG standard as a balance between image quality and computation complexity.

The DCT transformation sub-block decorrelates the image data by reducing or in some case eliminating inter pixel redundancy. The DCT transform takes some points from spatial domain and transform them into an equivalent representation in the frequency domain. Transformation is a lossless operation, therefore, the inverse of transformation will produce the original image.

At this point, we haven’t accomplished any compression. It just replaces 8×8 pixels by 8×8 DCT coefficients. In fact, output of DCT is larger than before. But the important thing is all data has been organized in term of importance. The low frequencies which are the most importance located on the upper-left of DCT coefficients. While the higher frequencies, the one which will be eliminated, are located on the lower-right of DCT coefficients.

Before entering the DCT Transform process, the 8×8 pixels data has to be converted from unsigned integers with range[0,2P-1] to signed integers with range[-2P-1,2P-1-1], where P is the number of bits per pixel value (Y, Cr or Cb). This process can be done by subtracting 2P-1 [0,2P-1]from the pixel value. If we have 8-bit per pixel value, we convert pixel value [0-255] to [-128,127] by subtracting 128.

After all DCT coefficients are organized based on its frequencies, we are ready to the next step of image compression. The next processing is quantization. This is the lossy process of image compression, where we eliminate the least information from image data. The quantization process divides DCT coefficients value by the Quantizer coefficients. Therefore, the Quantizer modules requires pre-defined coefficient that is stored in the Quantization Coefficients Table.

The Quantization block uses the fact that human eye is unable to perceive high frequency data in an image. Some of these data can be discarded without introducing noticeable visual flaw. Thus the goal of quantization is to reduce most of high frequency coefficient to zero. The more zeros we generate the better compression ratio we achieve. But as the consequence, the quality of compressed image will be reduced.

The detail processing of each block in this module will be described in the next section.

Figure2Figure2  The Image Compression Module


In order to decode and reconstruct the original image, the compressed image bitstream data is passed through the decompression module as shown in Figure 3. At the output of decompression module, the output data is added by 128 to convert back to unsigned integer [0-255]. The decompression module consists of blocks that are dual of the compression module.

Figure3Figure3   The Decompression Module


In this section we will describe the detail algorithm of DCT/IDCT transform and Quantizer/Dequantizer. The explanation starts from 1D-DCT, and then continues to represent 2D-DCT from 1D-DCT.


The equation for one-dimension N points DCT is as follows:



for u=0,1,2 … N. where α(u) is defined as



It is clear that for u=0, we will have



Thus, the first DCT coefficient is the average value of the sample sequence. This value is called DC coefficient and all others values are called AC coefficients.

The above equation can be treated as a matrix multiplication.



where X is input matrix, C is transformation matrix, and Z is transformed matrix. For 8×8 points 1D-DCT, we will have transformation matrix as Equation 5.






Similarly, the inverse transformation is defined as



for x=0,1,2 …. N. α(u) is defined as



The above equation can be treated as matrix multiplication



where X is input matrix, C is transformation matrix, and Z is transformed matrix.


The 2D-DCT is an extension of 1D-DCT and can be defined as



for u,v=0,1,2 …. N-1. α(u) and α(v) are defined in Equation 2.
The inverse transformation is defined as



for x, y = 0,1,2 …. N-1. The 2D-DCT calculation above can be modified using separatility property of DCT equation. Therefore it can be expressed as



It means that 2D-DCT calculation can be achieved through 1D-DCT horizontal transform (row-wised) and a 1D-DCT vertical transform (column-wised). This idea can be illustrated graphically as shown in Figure 4.

Figrue4Figure 4.  The Computation of 2D-DCT using 1D-DCT

In the matrix operation, 2D-DCT transformation can be calculated using Equation 12



The same idea can be used for inverse DCT, it can expressed as



The importance thing to remember between two stages of 2D-DCT/IDCT Transform, the resulted matrix from previous 1D-DCT (or IDCT) must be transposed before entering to the second stage of 1D-DCT.


In the quantization process, every element of DCT coefficient is divided by the quantization matrix Q as defined in Equation 14.



The quantization matrix Q usually has lower number in the upper left and increase as they get closer to the lower right. There is no fixed value of quantization matrix, this is the prerogative of the user to select a quantization matrix. However, JPEG committee has recommended some quantization matrix Q, such the example matrix Q50_Luminanceand Q50_Chrominance below.




We can customize the compression level at runtime using quantization table which consist of several quantization matrix. If users want to get better compression ratio, they can use higher quantization matrix. But if they want to get better image quality, they can use lower quantization matrix. We can get another quantization matrix by scaling basic quantization matrixes above using Equation 16.



For example, we can scale quantization table for Q80.




At the input of The Decompression Module, we have to Dequantize the DCT coefficients before entering into the IDCT module as defined in Equation 18.。




Here we introduce 1D-DCT Architecture as an example. Referring to Equation 5, we can draw the architecture as shown in Figure 5. The rectangle means multiplication and circle means adder. The label inside the square means those data has to be multiplied by coefficient defined in Equation 5.

If you have another method or architecture, it would be very good. You must be able to optimize the hardware implementation by reducing the number of multiplier or adder. The lower number of adder or multiplier is better, since we can save a lot of area and power. You can also find the best multiplier to optimize the circuits.

Figrue5Figure 5  The 1D-DCT Architecture

The first 1D-DCT receives matrix input in row-wise order. Transposition Buffer receives row-wise results and give column-wise input to the second 1D-DCT as depicted in Figure 6.

Figrue6Figure 6.  The 2D-DCT Architecture



In order to get lossless compression using DCT, we need to process DCT using floating point number. Using floating point number in hardware is more complicated than integer number. In order to reduce this problem without losing too much precision, we can use fixed point number for hardware implementation. In this representation, we must determine how many bits will be used for representing decimal number and how many bits will be used for fractional number. Floating point number is approximated using a pair of integers , where n is mantissa and q is exponent. Exponent q represents how many bits for representing fractional number in binary.

Table 1  The Fixed Point Example
Mantissa (n) Exponent (q) Binary Decimal
01100100 -1 011001000 200
01100100 0 01100100 100
01100100 1 0110010.0 50
01100100 2 011001.00 25
01100100 3 01100.100 12.5
01100100 7 0.1100100 0.78125

There is trade-off between how many bits we use for binary number and design area/power. Thus, choosing appropriate bits for binary number is important. If we use two’s complement we need a sign bit on MSB position.

Sign bit Integer Fraction
Figure 7   The Fixed Point Two’s Complement Format



Accuracy of image compression influences significantly the quality of multimedia content. An often used global objective quality measure is the mean square error (MSE) defined as

Equation 19


where n and m are the total pixel. F(i,j) and f(i,j) are the pixel values in the compressed and original image. Another quality measurement is peak to peak signal to noise ratio (PSNR in dB) that is defined as

Equation 20


PSNR computes the peak signal-to-noise ratio between the original and compressed image whereas the MSE represents the cumulative squared error between the original image and compressed image. The higher PSNR, the better quality of compressed image.


Here we provide image compression code using MATLAB®:compress.m

In order to fully understand the computation, reading the MATLAB® code is one of good method. Please pay attention that in MATLAB index starts from 1 not 0.

As shown in Figure 8, we can see that the 8×8 pixels data (a) is compressed into quantized coefficients (c) which have a lot of zero.




Figure8  The example of input and computation result of the compression module.

(a) 8×8 pixles data input  (b) The result of DCT processing  (c) The result of Quantization.


At the decompression module, the quantized coefficient from previous processing can be used to reconstruct the original image. The computation result example can be seen in Figure 9. We can see that the pixels value in Figure 8 (a) is similar to Figure 9(b).



Figure9  The example of input and computation result of the decompression module.

(a) The result of Quantization processing  (b) The result of IDCT processing.


After processing is done for all pixels, we can see the reconstructed image in Figure 10.

Original Image (Size : 64 KB) Compressed Image (Size : 36 KB)

Figure10  The comparison between original and reconstructed image

8.Level 1 Basic Task

In this level, designer needs to design compression and decompression system as shown in Figure 11. The input of the system is QCIF size image (176 x 144 pixels). The input image is divided into 8×8 pixels which are called a block data of Y, Cr and Cb. The processing is done in raster scan manner that is started from upper left, and end in lower right corner of the image. The compression module consists of blocks as defined in Figure 12. The 1D-DCT block uses architecture as defined in Figure 5.

For each block of data processing, complete timing diagram is illustrated in Figure 13. The input data enters the compression module in column wise (C0, C1…C7). Each column consists of 8 pixels data with different row (R0, R1…R7). Therefore 8 clock cycles are required to input all block data. The input signal coming serially indicated by valid_input signal. One clock cycle after the first column data C0, the signed output is come out of Unsigned to Signed processing block. Therefore this block processing latency (Lsigned) is one clock cycle.

The output of Unsigned to Signed processing block is feed into 1D-DCT processing block. The processing latency (L1D-DCT) depends on how 1D-DCT in Figure 5 is implemented. The designer may divide it into several pipeline stages. The output of 1D-DCT processing block is stored in Transposition Buffer.

After all one block of pixels are processed by 1D-DCT processing block, the result is processed again by 1D-DCT in different direction. The processing delay is still the same with L1D-DCT. As soon as the first column data is finished, the result is computed by the Quantizer processing block. The latency of Quantizer processing (LQUANT)depends on how to implement it and its bit width. As a result, total time required to process a block of pixels data depends on Lsigned, L1D-DCT, and LQUANT.

The signed, transformation and quantization process are held when enable signal is asserted. Enable signal indicates that the core is ready to receive new input data. After some latency, output signal is validated when valid_output signal is asserted.

In order to optimize the design parameter (size, speed) while maintaining the image quality, the designer has to decide the optimum bit width of each processing block. This can be done by simulating the processing using fix point representation.

Figrue11fig11   System Organization

Figrue11fig12  The Compression Module Block Diagram

Figrue11fig13  One Block Compression Timing Diagram

No Name Port Description Active I/O Width
1 clock Clock signal Input 1
2 reset Reset signal Low Input 1
3 enable Indicates that the core is ready to receive new data High Input 1
4 cmode Table quantization selector
Input 1
5 datain Column input signal Input 64
6 valid_input Sample the valid data at datain port High Input 1
7 valid_output Validata the output data at dataout port High Output 1
8 dataout Compressed image signal Output 64

Table2   Port Definition


The DCT and quantization circuit are power and area consuming. You can optimize the hardware implementation of this algorithm by reducing adder, multiplication or use other methods. Different methods are welcomed and please design a unique circuit. You can claim it as your appealing point and originality.



Since it is impossible to use the same synthesis library for various participants,

  • Use 1 exor gate delay as a 1 UNIT_DELAY for speed comparison
  • Use 1 exor gate area as a 1 UNIT_AREA for area comparison
  • How to measure 1 exor gate delay
    1. Synthesize the 50 inputs exor gates
    2. Measure the total delay time
    3. UNIT_DELAY is obtained by total delay divided by the number of stages
    4. UNIT_AREA is obtained by the total area divide by number of exor gates
  • VHDL code for 50 inputs exor :parity.vhd
  • Example of synthesized circuit :PDF,PS
  • Example of critical path delay measurement :report timing
  • Example of circuit area measurement :report area

In the previous example, total delay = 7.17 ns and 6 circuit stages, then the 7.17/6= 1.195 ns is the UNIT_DELAY of the speed. Please normalize your circuit speed by this UNIT_DELAY.

In the example, total cell area = 147.0 and 49 EXOR gates. Then 147.0/49=3.0 is the UNIT_AREA. Please normalize your circuit area by this UNIT_AREA.


In order to make easy comparison among many design entries, please use the following guide :

  • Do not use H/W macro RAM. Use synthesizable RAM description and synthesize your RAM using Flip-flops.
  • If you cannot use synthesizable RAM for your design, please state the reason clearly in your report.
  • Do not use H/W macro ROM. Realize the ROM by synthesized combinational logic.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Virtual Fashion Technology

Virtual Fashion Education


"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)


News About Tech, Money and Innovation


Modern art using the GPU

Theme Showcase

Find the perfect theme for your blog.


Learn to Learn

Gocomay's Blog

Con tằm đến thác vẫn còn vương tơ

Toán cho Vật lý

Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109

Maths 4 Physics & more...

Blog Toán Cao Cấp (M4Ps)

Bucket List Publications

Indulge- Travel, Adventure, & New Experiences


‎"Behind every stack of books there is a flood of knowledge."

The Blog

The latest news on and the WordPress community.

%d bloggers like this: