"Behind every stack of books there is a flood of knowledge."
The amount of information of picture on digital form is quite large compared with commonly available bandwidth to transmit them over the internet or wireless communication. Another application like video requires thousand of pictures, which need a lot of space for storing them. Therefore, we need a way for reducing the storage requirement for image data. One of the common method is the image compression.
The image compression system is able to reduce image data size without significantly reducing image quality. It removes the least information on the image which considered as high frequency. The human eye is more difficult to see the difference in high frequency than low frequency. Moreover, most of digital image is low frequency. By eliminating the higher frequency we can significantly reduce the image data size. This elimination is considered as lossy compression technique. It means that the compressed image is not exactly same as the original image.
The objective of this contest is to design the image compression and decompression system using HDL (VHDL or Verilog HDL) and to synthesize digital circuits using Synopsys design analyzer or any other EDA tools. Making FPGA is also optional but our judges love to see your FPGA designs.
Figure 1 shows the block diagram of proposed image compression system. An object is captured by a camera in YCrCb format (Y: luminance, Cb: Chrominance-Blue, and Cr: Chrominance-Red). The Compression Module will compress the image into bitstream data. The compressed bitstream data will be ready to be transmitted or stored.
At the receiver side, the birstream data will be decompressed into original image by the Decompression Module. The decompressed image is in YCrCb format, and ready to be displayed in Monitor.
In this section, we will explain detail block diagram of the compression and decompression module.
The Compression Module consists of the DCT Transform and the Quantizer block as can be seen in Figure 2. In this module, the input image data will be processed by 8×8 pixels. Therefore 8 points 2-D DCT transform is required. The size of this 8×8 pixels was determined by JPEG and MPEG standard as a balance between image quality and computation complexity.
The DCT transformation sub-block decorrelates the image data by reducing or in some case eliminating inter pixel redundancy. The DCT transform takes some points from spatial domain and transform them into an equivalent representation in the frequency domain. Transformation is a lossless operation, therefore, the inverse of transformation will produce the original image.
At this point, we haven’t accomplished any compression. It just replaces 8×8 pixels by 8×8 DCT coefficients. In fact, output of DCT is larger than before. But the important thing is all data has been organized in term of importance. The low frequencies which are the most importance located on the upper-left of DCT coefficients. While the higher frequencies, the one which will be eliminated, are located on the lower-right of DCT coefficients.
Before entering the DCT Transform process, the 8×8 pixels data has to be converted from unsigned integers with range[0,2P-1] to signed integers with range[-2P-1,2P-1-1], where P is the number of bits per pixel value (Y, Cr or Cb). This process can be done by subtracting 2P-1 [0,2P-1]from the pixel value. If we have 8-bit per pixel value, we convert pixel value [0-255] to [-128,127] by subtracting 128.
After all DCT coefficients are organized based on its frequencies, we are ready to the next step of image compression. The next processing is quantization. This is the lossy process of image compression, where we eliminate the least information from image data. The quantization process divides DCT coefficients value by the Quantizer coefficients. Therefore, the Quantizer modules requires pre-defined coefficient that is stored in the Quantization Coefficients Table.
The Quantization block uses the fact that human eye is unable to perceive high frequency data in an image. Some of these data can be discarded without introducing noticeable visual flaw. Thus the goal of quantization is to reduce most of high frequency coefficient to zero. The more zeros we generate the better compression ratio we achieve. But as the consequence, the quality of compressed image will be reduced.
The detail processing of each block in this module will be described in the next section.
In order to decode and reconstruct the original image, the compressed image bitstream data is passed through the decompression module as shown in Figure 3. At the output of decompression module, the output data is added by 128 to convert back to unsigned integer [0-255]. The decompression module consists of blocks that are dual of the compression module.
In this section we will describe the detail algorithm of DCT/IDCT transform and Quantizer/Dequantizer. The explanation starts from 1D-DCT, and then continues to represent 2D-DCT from 1D-DCT.
The equation for one-dimension N points DCT is as follows:
for u=0,1,2 … N. where α(u) is defined as
It is clear that for u=0, we will have
Thus, the first DCT coefficient is the average value of the sample sequence. This value is called DC coefficient and all others values are called AC coefficients.
The above equation can be treated as a matrix multiplication.
where X is input matrix, C is transformation matrix, and Z is transformed matrix. For 8×8 points 1D-DCT, we will have transformation matrix as Equation 5.
Similarly, the inverse transformation is defined as
for x=0,1,2 …. N. α(u) is defined as
The above equation can be treated as matrix multiplication
where X is input matrix, C is transformation matrix, and Z is transformed matrix.
The 2D-DCT is an extension of 1D-DCT and can be defined as
for u,v=0,1,2 …. N-1. α(u) and α(v) are defined in Equation 2.
The inverse transformation is defined as
for x, y = 0,1,2 …. N-1. The 2D-DCT calculation above can be modified using separatility property of DCT equation. Therefore it can be expressed as
It means that 2D-DCT calculation can be achieved through 1D-DCT horizontal transform (row-wised) and a 1D-DCT vertical transform (column-wised). This idea can be illustrated graphically as shown in Figure 4.
In the matrix operation, 2D-DCT transformation can be calculated using Equation 12
The same idea can be used for inverse DCT, it can expressed as
The importance thing to remember between two stages of 2D-DCT/IDCT Transform, the resulted matrix from previous 1D-DCT (or IDCT) must be transposed before entering to the second stage of 1D-DCT.
In the quantization process, every element of DCT coefficient is divided by the quantization matrix Q as defined in Equation 14.
The quantization matrix Q usually has lower number in the upper left and increase as they get closer to the lower right. There is no fixed value of quantization matrix, this is the prerogative of the user to select a quantization matrix. However, JPEG committee has recommended some quantization matrix Q, such the example matrix Q50_Luminanceand Q50_Chrominance below.
We can customize the compression level at runtime using quantization table which consist of several quantization matrix. If users want to get better compression ratio, they can use higher quantization matrix. But if they want to get better image quality, they can use lower quantization matrix. We can get another quantization matrix by scaling basic quantization matrixes above using Equation 16.
For example, we can scale quantization table for Q80.
At the input of The Decompression Module, we have to Dequantize the DCT coefficients before entering into the IDCT module as defined in Equation 18.。
Here we introduce 1D-DCT Architecture as an example. Referring to Equation 5, we can draw the architecture as shown in Figure 5. The rectangle means multiplication and circle means adder. The label inside the square means those data has to be multiplied by coefficient defined in Equation 5.
If you have another method or architecture, it would be very good. You must be able to optimize the hardware implementation by reducing the number of multiplier or adder. The lower number of adder or multiplier is better, since we can save a lot of area and power. You can also find the best multiplier to optimize the circuits.
The first 1D-DCT receives matrix input in row-wise order. Transposition Buffer receives row-wise results and give column-wise input to the second 1D-DCT as depicted in Figure 6.
In order to get lossless compression using DCT, we need to process DCT using floating point number. Using floating point number in hardware is more complicated than integer number. In order to reduce this problem without losing too much precision, we can use fixed point number for hardware implementation. In this representation, we must determine how many bits will be used for representing decimal number and how many bits will be used for fractional number. Floating point number is approximated using a pair of integers , where n is mantissa and q is exponent. Exponent q represents how many bits for representing fractional number in binary.
|Mantissa (n)||Exponent (q)||Binary||Decimal|
There is trade-off between how many bits we use for binary number and design area/power. Thus, choosing appropriate bits for binary number is important. If we use two’s complement we need a sign bit on MSB position.
Accuracy of image compression influences significantly the quality of multimedia content. An often used global objective quality measure is the mean square error (MSE) defined as
where n and m are the total pixel. F(i,j) and f(i,j) are the pixel values in the compressed and original image. Another quality measurement is peak to peak signal to noise ratio (PSNR in dB) that is defined as
PSNR computes the peak signal-to-noise ratio between the original and compressed image whereas the MSE represents the cumulative squared error between the original image and compressed image. The higher PSNR, the better quality of compressed image.
Here we provide image compression code using MATLAB®:compress.m
In order to fully understand the computation, reading the MATLAB® code is one of good method. Please pay attention that in MATLAB index starts from 1 not 0.
As shown in Figure 8, we can see that the 8×8 pixels data (a) is compressed into quantized coefficients (c) which have a lot of zero.
Figure8 The example of input and computation result of the compression module.
(a) 8×8 pixles data input (b) The result of DCT processing (c) The result of Quantization.
At the decompression module, the quantized coefficient from previous processing can be used to reconstruct the original image. The computation result example can be seen in Figure 9. We can see that the pixels value in Figure 8 (a) is similar to Figure 9(b).
Figure9 The example of input and computation result of the decompression module.
(a) The result of Quantization processing (b) The result of IDCT processing.
After processing is done for all pixels, we can see the reconstructed image in Figure 10.
|Original Image (Size : 64 KB)||Compressed Image (Size : 36 KB)|
Figure10 The comparison between original and reconstructed image
In this level, designer needs to design compression and decompression system as shown in Figure 11. The input of the system is QCIF size image (176 x 144 pixels). The input image is divided into 8×8 pixels which are called a block data of Y, Cr and Cb. The processing is done in raster scan manner that is started from upper left, and end in lower right corner of the image. The compression module consists of blocks as defined in Figure 12. The 1D-DCT block uses architecture as defined in Figure 5.
For each block of data processing, complete timing diagram is illustrated in Figure 13. The input data enters the compression module in column wise (C0, C1…C7). Each column consists of 8 pixels data with different row (R0, R1…R7). Therefore 8 clock cycles are required to input all block data. The input signal coming serially indicated by valid_input signal. One clock cycle after the first column data C0, the signed output is come out of Unsigned to Signed processing block. Therefore this block processing latency (Lsigned) is one clock cycle.
The output of Unsigned to Signed processing block is feed into 1D-DCT processing block. The processing latency (L1D-DCT) depends on how 1D-DCT in Figure 5 is implemented. The designer may divide it into several pipeline stages. The output of 1D-DCT processing block is stored in Transposition Buffer.
After all one block of pixels are processed by 1D-DCT processing block, the result is processed again by 1D-DCT in different direction. The processing delay is still the same with L1D-DCT. As soon as the first column data is finished, the result is computed by the Quantizer processing block. The latency of Quantizer processing (LQUANT)depends on how to implement it and its bit width. As a result, total time required to process a block of pixels data depends on Lsigned, L1D-DCT, and LQUANT.
The signed, transformation and quantization process are held when enable signal is asserted. Enable signal indicates that the core is ready to receive new input data. After some latency, output signal is validated when valid_output signal is asserted.
In order to optimize the design parameter (size, speed) while maintaining the image quality, the designer has to decide the optimum bit width of each processing block. This can be done by simulating the processing using fix point representation.
|3||enable||Indicates that the core is ready to receive new data||High||Input||1|
|4||cmode||Table quantization selector
|5||datain||Column input signal||Input||64|
|6||valid_input||Sample the valid data at datain port||High||Input||1|
|7||valid_output||Validata the output data at dataout port||High||Output||1|
|8||dataout||Compressed image signal||Output||64|
Table2 Port Definition
The DCT and quantization circuit are power and area consuming. You can optimize the hardware implementation of this algorithm by reducing adder, multiplication or use other methods. Different methods are welcomed and please design a unique circuit. You can claim it as your appealing point and originality.
Since it is impossible to use the same synthesis library for various participants,
In the previous example, total delay = 7.17 ns and 6 circuit stages, then the 7.17/6= 1.195 ns is the UNIT_DELAY of the speed. Please normalize your circuit speed by this UNIT_DELAY.
In the example, total cell area = 147.0 and 49 EXOR gates. Then 147.0/49=3.0 is the UNIT_AREA. Please normalize your circuit area by this UNIT_AREA.
In order to make easy comparison among many design entries, please use the following guide :
Virtual Fashion Education
"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)
News About Tech, Money and Innovation
Modern art using the GPU
Find the perfect theme for your blog.
Learn to Learn
Con tằm đến thác vẫn còn vương tơ
Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109
Blog Toán Cao Cấp (M4Ps)
Indulge- Travel, Adventure, & New Experiences
"Behind every stack of books there is a flood of knowledge."
The latest news on WordPress.com and the WordPress community.