网站公告列表

  没有公告

加入收藏
设为首页
联系站长
您现在的位置: 61IC中国电子在线 >> DSP >> ADI DSP >> Blackfin >> 文章正文
  Baseline JPEG compression juggles image quality and size         ★★★ 【字体:
Baseline JPEG compression juggles image quality and size
作者:61IC    文章来源:本站原创    点击数:    更新时间:2006-11-5    
Image compression, once the domain of the PC, is becoming pervasive in embedded environments. This trend is largely a result of the increased processing power and multimedia capabilities of new DSPs. Consequently, it now becomes advantageous for embedded designers to gain a better grasp of image algorithms in an effort to implement them efficiently. This article examines the Baseline JPEG compression standard, one of the most popular image compression algorithms in use today. While not covered explicitly here, the JPEG decode process is a straightforward inversion of the encode process.

Although there are many specified versions of JPEG, Baseline JPEG compression (referred to herein simply as "JPEG") embodies a minimum set of requirements. It is lossy, such that the original image cannot be exactly reconstructed (although a "Lossless JPEG" specification exists as well). JPEG exploits the characteristics of human vision, eliminating or reducing data to which the eye is less sensitive. JPEG works well on grayscale and color images, especially on photographs, but it is not intended for two-tone images. Figure 1 shows the basic encode process, which will be examined below in some detail.

Figure 1: Sample baseline JPEG encode data flow.

Preprocessing

Color Space

The incoming uncompressed image might be stored in one of several formats. One popular format is 24 bit/pixel RGB -- that is, 8 bits each of Red, Green and Blue subpixels. However, in viewing these separate R, G and B subchannels for a given image, there is usually a clear visual correlation between the 3 pictures. Therefore, in order to achieve better compression ratios, it is common to decorrelate the RGB fields into separate luminance (Y) and chrominance (Cb, Cr) components. The equations to do this are:

Y=0.299(R-G) + G + 0.114(B-G) Cb=0.564(B-Y) Cr=0.713(R-Y)

Spatial Filtering

Because the human eye is more sensitive to luminance than chrominance, image bandwidth can be reduced by subsampling the (Cb,Cr) fields. An unfiltered image with subpixels arranged as {Y,Cb,Cr,Y,Cb,Cr,Y...) is called 4:4:4 format, since there are 4 Y's for every 4 Cb's and 4 Cr's. Every pixel here consists of one {Y,Cb,Cr} entity. Put another way, for a 720x480 pixel image, 4:4:4 format implies that each of the 3 component fields is 720x480 bytes.

If we cut the horizontal bandwidth in half by subsampling the chrominance values, we arrive at the 4:2:2 format {Cb,Y,Cr,Y,Cb,Y,Cr,Y,...). Here, each pixel consists of a Y byte and either a Cb or Cr byte. Therefore, for a 720x480 image, 4:2:2 implies a Y matrix of 720x480, and Cb,Cr fields of 360x480 each.

To further reduce image bandwidth, we can also subsample chrominance values in the vertical direction. This results in the 4:2:0 format, implying that for a 720x480 image, the Y field would be 720x480 bytes, and Cb,Cr would each be 360x240 in size.

Whichever format is chosen, the image needs to be separated into Y, Cb and Cr buffers, because the JPEG algorithm acts on each component individually and in identical fashion. If the chrominance is subsampled, this is simply akin to running the JPEG algorithm on an image of smaller size. JPEG operates on 8x8-byte blocks of data, as shown in Figure 1. Therefore, in each image buffer, the data is partitioned into these 8x8 blocks, from left to right and top to bottom. These blocks do not overlap, and if the image dimensions are not multiples of 8, the last row and/or column of the image is duplicated as needed.

DCT

The DCT stage of JPEG exploits the fact that the human eye favors low-frequency image information over high-frequency details. The 8x8 DCT transforms the image from the spatial domain into the frequency domain. Although other frequency transforms can be effective as well, the DCT was chosen because of its decorrelation features, image independence, efficiency of compacting image energy, and orthogonality (which makes the inverse DCT very straightforward). Also, the separable nature of the 2D DCT allows the computation of a 1D DCT on the eight column vectors, followed by a 1D DCT on the 8 row vectors of the resulting 8x8 matrix. The 8x8 DCT can be written as follows:

For uniform handling of different image components, the DCT coder usually requires that the expected average value for all pixels is zero. Therefore, before the DCT is performed, a value of 128 may be subtracted from each pixel (normally ranging from 0 to 255) to shift it to a range of --127 to 127. This offset has no effect on the AC characteristics of the image block.

It is instructive to take a visual view of the DCT transform. Refer to the DCT basis functions shown in Figure 2. In performing a DCT on an 8x8 image block, what we are essentially doing is correlating the input image with each of the 64 DCT basis functions and recording the relative strength of correlation as coefficients in the output DCT matrix.

Figure 2: Each 8x8 pixel block is correlated with 64 DCT basis functions. The result from each correlation produces 1 output coefficient.

For example, the coefficient in the output DCT matrix at (2,1) corresponds to the strength of the correlation between the basis function at (2,1) and the entire 8x8 input image block. The coefficients corresponding to high-frequency details are located to the right and bottom of the DCT block, and it is precisely these weights which we try to nullify -- the more zeroes in the 8x8 DCT block, the higher the compression that is achieved. In the Quantization step below, we'll discuss how to maximize the number of zeroes in the matrix.

Quantization

After the DCT has been performed on the 8x8 image block, the results are quantized in order to achieve large gains in compression ratio. Quantization refers to the process of representing the actual coefficient values as one of a set of predetermined allowable values, so that the overall data can be encoded in fewer bits (because the allowable values are a small fraction of all possible values).

Remember that the human eye is much more attuned to low frequency information than high-frequency details. Therefore, small errors in high-frequency representation are not easily noticed, and eliminating some high-frequency components entirely is often visually acceptable. The JPEG quantization process takes advantage of this to reduce the amount of DCT information that needs to be coded for a given 8x8 block.

Quantization is the key irreversible step in the JPEG process. Although performing an inverse DCT will not exactly reproduce the original input 8x8 input matrix because of rounding error, the result is usually quite visually acceptable. However, after quantization, the precision of the original unquantized coefficients is lost forever. Thus, quantization only occurs in lossy compression algorithms, such as the Baseline JPEG implementation we're discussing here.

The quantization process is straightforward once the quantization table is assembled, but the table itself can be quite complex, often with a separate quantization coefficient for each element of the 8x8 DCT output block. Because this output matrix corresponds to how strongly the input image block exhibits each of the 64 DCT basis functions, it is possible to experimentally determine how important each DCT frequency is to the human eye and quantize accordingly. In other words, frequencies towards the bottom and right of the basis functions in Figure 2 will tend to have large values in their quantization table entries, because this will tend to zero out the higher-frequency elements of the DCT output block. The actual process of quantization is a simple element-wise division between the DCT output coefficient and the quantization coefficient for a given row and column.

A "Quality Scaling Factor" can be applied to the quantization matrix to balance between image quality and compressed image size. For the sample tables mentioned in the JPEG standard, typical quality values range from 0.5 (high recovered image quality) to 5 (high compression ratio).

Zig-Zag Sorting

Next, we prepare the quantized data in an efficient format for encoding. As we have seen from the DCT output, the quantized coefficients have a greater chance of being zero as the horizontal and vertical frequency values increase. To exploit this behavior, we can rearrange the coefficients into a one-dimensional array sorted from the DC value to the highest-order spatial frequency coefficient, as shown in Figure 3. This is accomplished using zig-zag sorting, a process which traverses the 8x8 block in a back-and-forth direction of increasing spatial frequency. On the left side of the diagram, we see the index numbers of the quantized output DCT matrix. Using the scan pattern shown in the middle of the figure, we can produce a new matrix as shown in the right side of the figure.

Figure 3: Illustration of zig-zag scan for efficient coefficient encoding.

Each quantized DCT output matrix is run through the zig-zag scan process. The first element in each 64x1 array represents the DC coefficient from the DCT matrix, and the remaining 63 coefficients represent the AC components. These two types of information are different enough to warrant separating them and applying different methods of coding to achieve optimal compression efficiency.

All of the DC coefficients (one from each DCT output block) must be grouped together in a separate list. At this point, the DC coefficients will be encoded as a group and each set of AC values will be encoded separately.

Coding the DC Coefficients

The DC components represent the intensity of each 8x8 pixel block. Because of this, significant correlation exists between adjacent blocks. So, while the DC coefficient of any given input array is fairly unpredictable by itself, real images usually do not vary widely in a localized area. As such, the previous DC value can be used to predict the current DC coefficient value. By using a differential prediction model (DPCM), we can increase the probability that the value we encode will be small, and thus reduce the number of bits in the compressed image. To obtain the coding value we simply subtract the DC coefficient of the previously processed 8x8 pixel block from the DC coefficient of the current block. This value is called the "DPCM difference". Once this value is calculated, it is compared to a table to identify the symbol group to which it belongs (based on its magnitude), and it is then encoded appropriately using an entropy encoding scheme such as Huffman coding.

Coding the AC Coefficients (Run-Length Coding)

Because the values of the AC coefficients tend towards zero after the quantization step, these coefficients are run-length encoded. The concept of run-length encoding is a straightforward principle. In real image sequences, pixels of the same value can always be represented as individual bytes, but it doesn't make sense to send the same value over and over again. For example, we have seen that the quantized output of the DCT blocks produces many zero-value bytes. The zig-zag ordering helps produce these zeros in groups at the end of each sequence. Instead of coding each zero individually, we simply encode the number of zeros in a given 'run.' This run-length coded information is then variable-length coded (VLC), usually using Huffman codes.

Entropy Encoding

The next processing block in the JPEG encode sequence is known as the entropy encoder. In this stage, a final lossless compression is performed on the quantized DCT coefficients to increase the overall compression ratio achieved. Entropy encoding is a compression technique that uses a series of bit codes to represent a set of possible symbols. Figure 4 shows an obvious 2-bit encode sequence with 4 possible output symbols (A, B, C and D)

Table 1: Example of entropy encoding with equal symbol probabilities.

In this example, we can uniquely describe each symbol with two bits of information. Because each symbol is represented as a 2-bit quantity, we refer to the code as "fixed length". Fixed length codes are most often applied in systems where each of the symbols occurs with equal probability. In reality, most symbols do not occur with equal probability. In these cases, we can take advantage of this fact and reduce the average number of bits used to compress the sequence. The length of the code that is used for each symbol can be varied based on the probability of the symbol's occurrence. By encoding the most common symbols with shorter bit sequences and the less frequently used symbols with longer bit sequences, we can easily improve on the average number of bits used to encode a sequence.

Huffman Coding

Huffman coding is a variable-length encoding technique that is used to compress a stream of symbols with a known probability distribution. Huffman code sequences are built with a code tree structure by pairing symbols with the two least probabilities of occurrence in a sequence. Each time a pair is combined, the new symbol takes on the sum of the two separate probabilities. This process continues with each node having its two probabilities combined and then being paired with the next smallest probable symbol until all the symbols are represented in the coded tree. Once the code tree is constructed, code sequences can be assigned within the tree structure by assigning a 0 or a 1 bit to each branch. The code for each symbol is then read by concatenating each bit from the branches, starting at the center of the structure and extending to the branch for which the relevant symbol is defined. This procedure produces a unique code for each symbol data size that is guaranteed to be optimal. Even though the codes can have different lengths, each code can be unambiguously decoded if the symbols are read beginning with the most significant bit. The JPEG standard provides a number of Huffman code tables to describe the different ways of encoding the input quantized data -- for instance, depending on whether luminance or chrominance information is being processed. Figure 5 uses the same example as in Figure 4, but with differently weighted symbol probabilities. The Bit Code column shows the modified encode sequence. As we can see, the most likely symbol is encoded with a single bit, while the least likely symbols require 3 bits each. With this table, we can determine the average bit length for a code per symbol. (0.5*1 bit) + (0.3*2 bits) + (0.1*3 bits) + (0.1*3 bits) = 1.7 bits/symbol In this example, the variable length encode scheme produces an average of 1.7 bits/symbol, which is more efficient than the 2 bits/symbol that the fixed-length code of Figure 4 produced.

Table 2: Example of entropy encoding with weighted symbol probabilities.

Another entropy encoding scheme used in JPEG is Arithmetic Coding. While this algorithm provides for better compression than Huffman coding because it uses adaptive techniques that make it easier to achieve the entropy rate, the additional processing required may not justify the fairly small increase in compression. In addition, there are patent restrictions on Arithmetic Coding.

JPEG File Interchange Format (JFIF)

The encoded data is written into the JPEG File Interchange Format (JFIF), which, as the name suggests, is a simplified format allowing JPEG-compressed images to be shared across multiple platforms and applications. JFIF includes embedded image and coding parameters, framed by appropriate header information. Specifically, aside from the encoded data, a JFIF file must store all coding and quantization tables that are necessary for the JPEG decoder to do its job properly.

Sources/Further Information

1. Symes, Peter. Video Compression Demystified. McGraw-Hill: New York, 2001. 2. Bhaskaran, Vasudev and Konstantinides, Konstantinos. Image and Video Compression Standards. 2nd edition. Kluwer Academic Publishers: Boston, 1997.
               欢迎点击进入:TI德州中文网   (国内唯一针对TI应用的中文技术网站)    文章录入:admin    责任编辑:admin 
  • 上一篇文章:

  • 下一篇文章:
  • 发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
    最新热点 最新推荐 相关文章
    Blackfin处理器及嵌入式mCli…
    为Blackfin ADSP-BF537评估板…
    基于Blackfin嵌入式系统的U-…
    Blackfin DSP并行外围接口在…
    Floating-Point Emulation o…
    Blackfin 处理器架构概述
    Unified DSP/MCU combines t…
    Blackfin 处理器内核基础知识
    示波器波形DSP滤波的优点和缺…
    移动MPEG-4视频编码的DSP有效…
      网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!)
    站长:61IC 湘ICP备05002478号