5/03/2009
4:HDTV basics
HDTV basics: H.264 encoding technologies, H.264 and the formulation of applications End in the formulation of the first after the H.263 standard, ITU-T Video Coding Experts Group (VCEG) will develop the work is divided into two parts: one called "short-term (short-term)" program is designed to H.263 increase the number of new features (the program has developed a H.263 + and H.263 + +); the other part is called "long-term (long-term)" program, the initial goal is to work out a time than other video coding standard to double the efficiency of the new standards. The plan started in 1997, the results as H.264 is the predecessor of the H.26L (initially called H.263L). Nearly the end of 2001, as the H.26L superior performance, ISO / IEC added to the MPEG group of experts to VCEG in common the establishment of a joint video Group (JVT), has taken over the development of H.26L. The organization's objectives are: "Study of new video coding algorithm, with the goal of performance than in the past developed a lot of the best standards." This standard as an international standard is officially in March 2003 held in Pattaya, Thailand, JVT 7th meeting adopted. As the standard is different from the two organizations jointly developed, so there are two different names: In the ITU-T, it was called H.264; and in ISO / IEC, it is known as MPEG-4 Part 10, that is advanced video coding (AVC). H.264 a wide range of applications, including video telephony (fixed or mobile), real-time video conferencing systems, video surveillance systems, Internet video transmission and multimedia, such as information storage. At present in the international arena, Canada's UB Video has developed a set of H.26L based on the TMS320C64x series of real-time video communication systems, it can 160kbit / s bit-rate obtained with the H.263 + in the 320kbit / s under the same image quality. Canadian companies another VideoLocus inserted in the system through an FPGA-based hardware expansion card in the P4 platform has been realized in real-time H.264 codec. Second, H.264 characteristics H.264 coding framework in the past or the MC-DCT structure, that is, motion compensation transform coding plus mixed (hybrid) structure, so it retains some of the characteristics of the previous standards, such as the unrestricted motion vector (unrestricted motion vectors ), on the motion vector prediction of the median (median prediction) and so on. However, introduction of technology enables H.264 video encoding than the previous standard in performance has been greatly improved. It should be noted that instead of relying solely on this to raise the achievement of a particular technology, but different types of technology to improve the performance of small and co-produced. 1. Intra Prediction Encoding of I-frame through the use of space rather than time-dependent correlation of realization. The old standard of using only a macroblock (macroblock) the relevance of the internal to the neglect of the correlation between the macroblock, so in general the amount of data encoded large. In order to further the relevance of the use of space, H.264 intra-prediction introduced to improve the compression efficiency. In short, intra-prediction coding is used around the values of neighboring pixels to predict the current pixel value, and then encode the prediction error on. This forecast is based on the block for the brightness component (1uma), can block the size of 16 × 16 and 4 × 4 to choose between, 16 × 16 block there are four kinds of forecasting models, 4 × 4 block are nine kinds of forecasting models ; for the chrominance component (chroma), the forecast for the entire 8 × 8 blocks, with four kinds of forecasting models. DC In addition to the forecast, the forecasting model of each other in different directions corresponding to the forecast. 2. Interframe prediction Like the previous standards, H.264 use motion estimation and motion compensation to eliminate the temporal redundancy, but it has the following five characteristics: (1) forecasts using variable block size As a result of block-based motion model assumed that all pixels within the block are doing the same translation, in sports or moving objects relatively sharp edges of the assumption that actual access to a larger, resulting in greater prediction error, this time by small size can make the assumption that in the small block is still set up. While a small block of the block caused by the relative effect is also small, so small blocks in general can improve the forecast results. To this end, H.264 uses a total of seven kinds of ways to partition a macroblock, each block mode is not the same size and shape, which makes encoder can select the best images of the contents of the forecasting model. With only 16 × 16 blocks compared to the forecast, the use of different size and shape can make the block more than 15% bit rate savings. (2) the prediction accuracy of more sophisticated In H.264 in, Luma component of the motion vector (MV) 1 / 4 pixel accuracy. Chroma component luma MV derived from the MV, as the chroma resolution is half of the luma (for 4:2:0), so the accuracy of their MV will be 1 / 8, which means that a unit represented by the chroma MV displacement chroma component is only the distance between sampling points of 1 / 8. So the prediction accuracy compared with fine precision integer can save over 20% rate. (3) multi-reference frame H.264 support for multi-reference frame prediction (multiple reference frames), which can have more than one (up to 5) in the current frame before the decoding of the frame as a reference frame can produce projections of the current frame (motion-compensated prediction). This applies to video sequences contain a cyclical movement in the situation. The use of this technology can improve the motion estimation (ME) the performance of H.264 codec to improve the error recovery capabilities, but at the same time increase the capacity of the cache as well as the codec complexity. However, H.264 is based on the proposed rapid development of semiconductor technology, so the burden of these two in the near future will become insignificant. Compared with only a reference frame, the use of five reference frames can save 5 to 10 percent rate. (4) Anti-blocking filter Anti-blocking filter (Deblocking Filter), its role is to remove the anti-quantization and inverse transform of the reconstructed image after the prediction error generated as a result of blocking, that is, the edges of the pixel block hopping value and thus to improve the image of a subjective quality, the second to reduce the prediction error. Deblocking Filter in H.264 also be able to make judgments based on image content, only the block effect as a result of the pixel values to smooth transition, while the edges of objects in the image pixel value given for a reservation to avoid the edge of ambiguity. Deblocking Filter with the previous difference is that, after the image after filtering will be based on the need for inter-frame prediction cache, rather than just the reconstruction of images in the output used to improve the subjective quality, that is to say that the filter decoder ring is located rather than the output decoder ring, which it called the Loop Filter. It should be noted that, for intra prediction, using the unfiltered reconstructed image wave. 3. Integer Transform H.264 intra-or inter-frame prediction of the residuals (residual) for DCT transform coding. In order to overcome the floating-point operations brought about by the complexity of hardware design and, more importantly, a result of rounding error encoder and decoder do not match (mismatch), the new standard for the definition of DCT been modified so that only conversion integer addition and subtraction and shift operations can be realized, so do not consider the quantitative effect on the output of decoder can accurately restore the input-side code. Of course, the cost of doing so is a slight drop in compression performance. In addition, the conversion is 4 × 4 blocks, which also help to reduce the block effect. In order to further the use of space-related images, and chroma in the prediction residual and 16 × 16 intra prediction of the prediction residual to the above-mentioned DCT integer transform, the standard will each 4 × 4 transform coefficients of the DC coefficient of block group into 2 × 2 or 4 × 4 block size, so further Hadamard (Hadamard) transform. 4. Entropy coding If Slice layer prediction residual, H.264, there are two methods of entropy coding: Context-based Adaptive Variable Length Code (Context-based Adaptive Variable Length Coding, CAVLC) and Context-based Adaptive Binary Arithmetic Coding (Context - based Adaptive Binary Arithmetic Coding, CABAC); if not prediction residual, H.264 using Exp-Golomb code or CABAC coding, depending on encoder settings. (1) CAVLC VLC is the basic idea of the symbol frequency of the use of large codeword shorter, and the small frequency of occurrence of symbols of the codeword longer. This makes the minimum average code length. In the CAVLC in, H.264 using a number of VLC code table, different code table corresponding to the probability of a different model. Encoder according to the context, such as around the block or the non-zero coefficient of the absolute size of coefficients in these tables automatically code options, the maximum extent possible with the current data model of the probability of matching, in order to achieve the context of adaptive function. (2) CABAC Arithmetic coding is a highly efficient entropy coding scheme, the symbols corresponding to each code length is considered scores. Because of the encoding of each symbol are encoded with the previous results, it is considered in the overall sequence of source symbols of the probability characteristics of a single symbol rather than the probability of identity, so it can be a greater source close to the limit of entropy to reduce the bit rate. Arithmetic coding in order to bypass a small number of infinite precision, as well as that of the source symbol of probability estimates, more than modern arithmetic coding state machine with limited way, H.264 is an example of CABAC, other examples are JPEG2000 . In CABAC, each encoding a binary symbol, encoder will automatically adjust the source of the probability model (with a "state" to that) estimates, then the binary symbols in this update of the probability model based on the coding. This source code does not require prior knowledge of statistical properties, but in the encoding process is estimated adaptively. Obviously, with the CAVLC encoding a number of pre-set probability model comparison, CABAC greater flexibility of the coding performance can be better - about 10% lower rate. Characteristics described above are used to improve the coding performance of H.264, H.264 also has very good ability to recover the error (error resilience) and adaptive network (network adaptability), Here are some of the features. 5. SP Slice SP Slice's main purpose is code for different flow switch (switch), The stream can also be used for random access, fast forward and rewind error recovery. Talking about here refers to different code streams at different bit-rate constraints of the same encoded source code stream generated. Switch-based code stream prior to transmission of the last one for Al, the goal of switching streams after the first frame as the B2 (the assumption that the P-frame), as a result of B2 frame of reference does not exist, so obviously a direct switch will lead to a lot of distortion , and this distortion will transfer back. A simple solution is to transfer intra-encoded B2, but in general I frame a large amount of data, this method will cause a sharp increase in transmission bit rate. According to previous assumptions, the same letter as the source code, even though a different bit rate, but the switch before and after the two must have a lot of relevance, it could be Al encoder B2 as a reference frame for interframe prediction of B2, prediction error is SP Slice, and then completed by passing a stream SP Slice the switch. With the conventional P-frame difference is generated by the SP Slice forecast and B2 in the Al domain of the transformation. SP Slice requested B2 after switching the image should be sent directly target the same stream. Obviously, if the goal is not to switch to another relevant stream, SP Slice does not apply. 6. Flexible macroblock order Flexible macroblock order (flexible macroblock ordering, FMO), refers to an image in the macroblock is divided into several groups, independent coding of a macroblock group may not necessarily be in order under Standing Orders, before and after scanning for , and may be randomly scattered in different image locations. If such a transmission error, a group can not be certain the correct decoding macroblock, the decoder can still be based on space-related images of the correct decoding on its surrounding pixels to restore them. Third, H.264 specific content Through the introduction of the above, no doubt, H.264 compression performance superior than the other standards, including MPEG-4 (2) (MPEG-4 Part 2). As we all know, MPEG-4 (2) is characterized by the largest object-oriented coding, the object is a concept advanced in the object has been extracted under the conditions of access to really high compression ratio, but how to extract the object become before the people before a major problem. A true object extraction algorithm with intelligent human beings should be able to think like human beings and are able to learn, and the current technology and thus fail to this point, although there are a lot of literature to introduce the method of extracting the object, but I that these only a temporary solution at best, only the right direction to a small step. It is for this reason, MPEG-4 (2) the idea of object-oriented coding is too advanced. ITU-T's VCEG unrealistic to give up the concept of the object with the current level of development of science and technology adapted to the H.264 (10) (MPEG-4 Part 10) (H.26L) video coding standard, This is a valuable and, more importantly, it achieved the same MPEG-4 (2) object-oriented coding one of the goals - a high compression ratio. Video signals is a great amount of data, in order to achieve efficient compression, must make full use of all kinds of redundancy, in general, the redundancy in video sequences, including two categories of statistical redundancy, it contains: (1 ) spectrum redundancy means the correlation between color components of; (2) spatial redundancy; (3) time redundancy, which is still image video compression from the fundamental point of compression, video compression major use of time redundancy to achieve large compression ratio. The second category is the physical visual redundancy, which is due to human visual system (HVS) properties resulting from, such as the human eye color component of the high frequency luminance components are not sensitive to high frequency, high frequency of the image (that is, the details of ) Department, such as the noise is not sensitive. In response to these redundancy, video compression algorithm uses a different method be used, but the main consideration on the space and time redundancy on redundancy. Similar to previous standards, H.264 has also been adopted so-called mixed (hybrid) structure of space and time redundancy to deal with redundancy, respectively. Space redundancy, through the transformation and quantitative criteria to achieve the purpose of the elimination of so called I-frame encoded frame; and time redundancy is through the inter-frame prediction, motion estimation and compensation that is, to remove, so called frame coding P-frame or B frame. With the previous standard is different, H.264 encoding I-frame, the use of intra-prediction, and then encode the prediction error on. This will take full advantage of the spatial correlation to improve the coding efficiency. H.264 intra-coding the diagram (details please refer to the "China multimedia video" Seventh) as shown. H.264 intra-prediction of a 16 × 16 for the basic unit of macroblock. First of all, the encoder will work with the current macroblock the same frame as a reference of the neighboring pixels, resulting in the current macroblock of predictive value, and then carried out on the prediction residual transform and quantization, and then transform and quantification of results after the entropy coding. The results of entropy coding can be formed on the stream. Due to end in the decoder to receive the data are to quantify the anti-conversion and anti-reconstruction images, so in order to make the same codec, encoder used to predict the end of the reference and decoder on the same side, but also through anti-conversion and anti-quantitative image reconstruction. One thing to note is that for the intra-prediction of these data do not need to filter through Deblocking Filter, which is encoded reference image frame is different. 1, intra-prediction Brightness Intra - 16 × 16 intra prediction mode of Figure (details please see "China's multi-media video" Seventh) as shown. Brightness Intra - 16 × 16 intra prediction mode Color component 8 × 8 4 intra prediction mode of Figure species (details see "China's multi-media video" Seventh) as shown. Color component 8 × 8 4 intra prediction mode of species Brightness component of the 4 × 4 8 species the direction of intra prediction mode. Figure 5 brightness component of 4 × 4 8 species the direction of intra prediction mode 2, transform and quantization The current image pixel value and subtract predicted to form a prediction residual. Still contain residual spatial redundancy, in order to eliminate this redundancy, transform coding is usually used, that is, transform - quantitative - step entropy coding. Transform does not compress the data, it is only to eliminate the relevance of data, or data redundancy (or correlation) in a subsequent entropy coding to facilitate the ways. Compression is the quantization and entropy coding in step completed. In addition to further reduce the amount of data, encoder also transform coefficients after the quantization, and its real value is to reduce the scope of data to reduce the entropy of each symbol. It will cause the loss of information is detrimental to encode an important step, it is to control the image rate-distortion (RD) characteristics of one of the main means. In H.264, the transform and quantization are closely linked in two steps. Integer DCT transformation formula H. 264 anti-DCT Transform the formula Commonly used in image coding is to transform DCT, because it is similar in theory, under certain conditions optimal KL transform. However, if directly used to transform the definition of DCT, which would bring about two problems: One is the need for floating-point operations, resulting in system design complexity; Secondly, as a result of nuclear transformation are irrational, and the limited accuracy of floating-point numbers can not be accurately said that irrational number, together with the floating-point arithmetic may be the introduction of round-off errors, which makes the concrete realization will lead to inconsistencies codec (mismatch), or anti-transform the output and input transformation is not the same. In order to overcome these problems, H. 264 using integer DCT transform, the integer transform operation only addition and subtraction and shift operations can be completed, so that not only reduces design complexity and to avoid codec mismatch, and the resulting reduction in minimal coding performance. Note that at this time of transformation is not the real DCT, still called DCT transform only that it is derived from the DCT, and another in order to transform (Hadamard transform) to distinguish it. H. Encoder 264 to quantify the process of transformation and see seven magazines. Chart for the prediction residual input and output in preparation for entropy encoding the data, a total of five. In order to greater use of spatial redundancy, the Intra 16 × 16 intra prediction mode, H. 264 of 16 × 16 components hma of 16 4 × 4 blocks after the DCT transform, each 4 × 4 block DC coefficient (also not been quantified) extracted, the formation of a 4 × 4 the luma DC block, its another 4 × 4 of the Hadamard (Hadamard) transform. Similarly, the 8 × 8 chroma component of the four 4 × 4 blocks after the DCT transform, and each 4 × 4 block DC coefficient extracted to form a 2 × 2 block of chroma DC, its 2 × 2 Hadamard transform, shown in figure 7. The figures show that the figure represented by the block in the bit stream in the order. DC coefficient of the brightness of the additional weight of the (4 × 4) 4-order Hadamard transform for chroma DC coefficient of an additional component of the (2 × 2) 2-order Hadamard transform processing DC coefficients in Figure (details please see "China's multi-media, as inquiry, "the seventh issue) as shown. Map is a decoder input (CAVLC or CABAC) the result, the output data together with the predictive value of the reconstructed image after the reconstruction images for intra-prediction, or after the show Deblocking Filter and stored in accordance with the need to alleviate deposit that is used for interframe prediction. There is a need attention, the DC coefficient (both Intra 16 × 16 luma DC or chroma DC), the decoder is the first anti-anti-transform and then to quantify the reasons for doing so will be at the back to explain the content. MUX refers to the DC coefficient of the assembly according to Figure 8 to the AC coefficient to form a complete 4 × 4 blocks, for the follow-up anti-DCT transform. At present, the main lack of H.264 is the complexity, but with the continuous advancement of technology, especially the development of semiconductor technology, chip processing power and memory capacity will be greatly improved, so the future is bound to H.264 full of vitality, the market has gradually become the main characters.
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment