This invention relates generally to video encoding, and more particularly to adaptive controlling a complexity of motion estimation during the video encoding.
The H.264/AVC video compression standard provides an increased compression efficiency compared to prior standards. H.264, also known as MPEG-4 Part 10, is a standard for a digital video codec. The H.264 standard and the MPEG-4 Part 10 standard, ISO/IEC 14496-10, are technically identical. The technology described by the standard is also known as advanced video coding (AVC).
H.264/AVC achieves encoding gains through a set of advanced encoding tools, including variable block size motion compensation, quarter-pel motion compensation, and long-term memory motion compensation. However, it is difficult to select a set of optimal encoding parameters, including motion vectors and prediction modes, such that an optimal compression efficiency is achieved.
In particular, long term memory motion compensated prediction (LTMCP) with variable block sizes is the major computational complexity bottleneck in the H.264/AVC encoder. Without loss of generality, scalable complexity is preferred for real time video encoding when limited computational resources are available.
In the prior art, various attempts have been made to reduce the complexity of mode decision and motion estimation for the H.264/AVC encoder. One method determines an initial search center based on a correlation between motion vectors of different block sizes, Z. Zhou, M.-T. Sun, Y.-F. Hsu, “Fast variable block-size motion estimation algorithm based on merge and slit procedures for H.264/MPEG-4 AVC,” Proceedings of the 2004 International Symposium on Circuits and Systems, Vol. 3, May 2004.
Other methods use fast motion estimation processes, such as efficient predictive zonal algorithms (EPZs), A. M. Tourapis, O. C. Au, M. L. Liou, “Highly efficient predictive zonal algorithms for fast block-matching motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, Issue 10, October 2002; UMHexagonS, Jianfeng Xu, Zhibo Chen, Yun He, “Efficient fast ME predictions and early-termination strategy based on H.264 statistical characters,” Proceedings of the Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Vol. 1, December 2003; and SEA, M. Yang, H. Cui, K. Tang, “Efficient tree structured motion estimation using successive elimination,” IEE Proceedings-Vision, Image and Signal Processing, Vol. 151, Issue 5, October 2004. Those methods reduce the number of searching points during motion estimation.
Other methods use a recent-biased search, Chi-Wang Ting, Hong Lam, Lai-Man Po, “Fast block-matching motion estimation via recent-biased search for multiple reference frames,” International Conference on Image Processing, Vol. 4, October 2004, and forward motion trace, M-J. Chen, Y-Y. Chiang, H-J Li, M-C. Chi, “Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVT/H.264,” Proceedings of the 2004 International Symposium on Circuits and Systems, Vol. 3, May 2004, to reduce the complexity of long term memory motion compensation.
A mode decision method, based on a coarse-to-fine approach, assumes a monotonic rate-distortion (RD) relation across block sizes, P. Yin, H. C. Tourapis, A. M. Tourapis, J. Boyce, “Fast mode decision and motion estimation for JVT/H.264,” IEEE International Conference on Image Processing, Vol. 3, September 2003. However, a further reduction in complexity is still very desired.
Several methods directly control the complexity of the video encoder. Zhong et al. disclose controlling the encoding complexity by using buffer monitoring during the encoding process, U.S. Patent Application Publication No. 2003/0123540. Song et al. adjust early termination thresholds for motion estimation and a DCT transform uses an average number of searching points, U.S. Patent Application Publication No. 2003/0156644. El-Maleh et al. select a predictive and non-predictive coding section based on a configurable threshold in order to balance complexity in memory and processing time, U.S. Patent Application Publication No. 2005/0105615.
Most methods for controlling the encoding complexity heuristically approximate the computational complexity of the encoding process, and adjust thresholds to achieve an objective, such as a decision of the encoding mode, or an early termination of processes, such as DCT or motion estimation.
The major complexity in the encoding process is due to motion estimation and mode decision. However, up to now, an accurate control of the complexity of those processes has not been available.
One embodiment of the invention provides an adaptive complexity control framework to efficiently control a complexity of a video encoding process, with long term memory motion compensate prediction (LTMCP), and predictive coding mode decision.
An objective of the invention is to provide a complexity scalable encoder. One embodiment of the invention provides an adaptive complexity control framework that uses variable block sizes to reduce the encoding complexity to a desired level with a minimum decrease in rate-distortion performance.
A method for controlling the computational complexity of motion estimation in motion compensated hybrid motion compensated video encoding is disclosed. A computational complexity of motion estimation is defined as a weighted number of searching points. A complexity control process allocates complexity hierarchically for groups of pictures, frames, macroblocks, block partitions, and blocks, in a descending order.
FIG. 1 is a diagram of a video encoder according to an embodiment of the invention;
FIG. 2 is a diagram of hierarchical levels of a video for complexity budget allocation;
FIG. 3 is a diagram of the steps of complexity budget allocation according to an embodiment of the invention;
FIG. 4 is a flow chart of adaptive complexity control according to an embodiment of the invention; and
FIG. 5 is an example J-C curve according to an embodiment of the invention.
As shown in FIG. 1, one embodiment of the invention provides a system and method 100 for controlling a computational complexity of motion estimation while encoding 120 a video 101. The complete method is shown in FIG. 4. The complexity control method and system adaptively allocates a complexity budget 103 to hierarchical level of a video, as shown in FIG. 2.
In FIG. 1, the solid lines indicate data flow, and the dashed lines control. The system takes as input 101 a video, and produces as output 102 an encoded bitstream according to the complexity budget 103 provided by a particular application. The complexity budget 103 is allocated 110 to an encoding unit 120 according to an expected rate-distortion (RD) cost gains to achieve an optimal complexity allocation.
As shown in FIG. 2, the hierarchical levels can include the video 101, groups of pictures (GOP) 205, frames 210, macroblocks 215, block partitions 220, and blocks 225. Each block is a 4×4 array of pixels 230. The first frame 211 of a GOP is called an instantaneous decoding refresh picture (IDR).
Referring back to FIG. 1, the input video 101 is provided to the analysis and allocation unit 110 and the encoding unit 120. The encoding unit generates an output bitstream 102 subject to an apportioned complexity budget 111 for each hierarchical level. While encoding, the coding unit 120 outputs an actual consumed complexity budget 121 back to the complexity analysis/allocation unit 110. The allocation unit adjusts the apportioned complexity budgets 111 accordingly, in a dynamic manner.
The adaptive complexity control (ACC) method allocates available complexity budget 111 to the encoding unit 120 for each level such that the rate-distortion loss is minimized, subject to the given encoding time limit such as,
where N,
The lambda (λ) is the Lagrangian multiplier, which controls the rate-distortion tradeoff during macroblock encoding, in our case, motion estimation, i.e., cost J=Distortion (D)+Lambda*Rate(R). For instance, when λ=0, the optimization minimizes the distortion, and when λ=∞, the optimization minimizes the rate. Generally, at a relatively low bit-rate, λ increases such that the rate term becomes a more significant part of the optimization.
The goal of the ACC method is to provide scalability in motion estimation complexity at each level of the encoding hierarchy. Therefore, the method enables the encoder to reduce complexity to a desired level for a particular motion estimation process. At the same time, the ACC method minimizes an expected RD performance degradation by employing a Lagrangian RD cost and motion estimation complexity, (J-C), curve based complexity allocation.
Definition of the Complexity of Motion Estimation
The computational complexity of motion estimation can be adjusted using a machine independent measure, proportional to the motion estimation time, such as a weighted number of searching points. In motion estimation, a block of an image is compared with a reference image to determine which block of the reference image best matches the block. To determine the best matching block in the reference image, a difference measure is used to measure an amount of difference between the macroblock and each possible block in the reference image.
A searching point is defined as a block in the reference image that is compared to the current block in the motion estimation process. The searching points are typically limited in a searching window. In exhaustive motion estimation, all possible searching points in the searching window are evaluated. A fast motion estimation method evaluates a subset of the all possible searching points in the searching window. The complexity in motion estimation is defined by a linear combination of a weighted number of searching points in each block partition as
where ω_{l }represents a weight for a block partition with a block size l and C_{m}^{k }is the number of searching points for the partition m in the k^{th }macroblock. The variables K, L, M and B_{i}(j) are the total number of macroblocks in a frame, number of block partitions for a given macroblock, the number of blocks in the block partition, and an assigned frame budget for the j^{−th }frame in the i^{−th }group of pictures (GOP), respectively. The weight ω_{l }is based on the block partition because larger partitions consume more time to calculate the motion cost as,
where N, M, and n_{4×4 }are the horizontal block partition length in pixels, vertical block partition length in pixels, and the number of pixels in a minimum block. The number of pixels in a minimum block is, for example, sixteen. By replacing the constraint
with Equation (2), Equation (1) becomes
Complexity Control for Frame Level
Unconstrained motion estimation is performed on a first predictive frame. From the second predictive frame in the GOP, the frame level complexity budget B_{i}(j), j=2, . . . , N_{i}, is determined by subtracting the complexity budget b_{i}(j−1) that was allocated for motion estimation for the previous frame as described in Equation (4) from the previous target budget B_{i}(j−1). For the remaining frames in the GOP, the complexity budget is increased when the processing time of unit complexity is decreased, and decreased otherwise.
The complexity budget for the current frame is allocated and updated for each frame successively as
where B_{i}(j) is the complexity budget after the (j−1)^{th }frame in the i^{−th }GOP and
, is the normalized complexity at the motion estimation time T_{i}(j) for the first predicted frame in the i^{−th }GOP. The variables N_{i}, Fr, and b_{i}(j−1) are the total number of predicted frames, a predefined frame rate, and the actual complexity budget used in the (j−1)^{th }frame, respectively.
The initial normalized complexity, R_{i}(j=1), is obtained by performing unconstrained motion estimation. The reason for a full motion search for the first frame is to obtain a ratio of weighted searching points and the motion estimation time. This prevents a decrease in RD performance due to inaccurate budget allocation for the first frame.
Complexity Control for Macroblock Level
The complexity budget {dot over (B)}_{i}(j)=B_{i}(j)/(N_{i}−j+1), j=2, . . . , N_{i}, for the current frame is initially allocated to each of the macroblocks M_{1}, . . . , M_{K }depending on an expected RD performance and associated required complexity. At the end of encoding each macroblock, the initial budgets for the remaining macroblocks are updated based on the complexity budget {dot over (C)}_{j}^{MB }(k) and allocated to the previously encoded macroblocks. If the current frame is encoded with an initial budget plus a variable amount of additional budget {dot over (B)}_{i}(j)±ΔB, then ∓AB is added to frame budget for remaining frames.
The complexity control for the macroblock levels is designed to allocate the available complexity budget to each macroblock so that the expected RD performance degradation is minimized by employing a J-C curve based allocation.
To estimate the J-C curve of the current macroblock, the curve of a collocated macroblock in a previous frame is stored in a memory at each iteration. The iteration can be defined as any number of weighted searching points. Here, it is defined as the number of weighted searching points used in one reference frame 1≦n_{ref}≦N_{ref }in each block size 1≦l≦L=7 such as N_{max}^{itr}=L×N_{ref}. If only one reference is used, then there can be seven iterations, i.e., (J, C) pairs, in the J-C curve maximally, where the block order is checked in descending order from 16×16 to 4×4.
From the experimental results indicated by the J-C curves, we observed that the majority of macroblocks with simple motion, such as background or smoothly moving objects, have linear J-C curves rather than a convex shape. On the other hand, macroblocks of complex motion indicated convex J-C curves, in general.
FIG. 5 shows an example J-C convex curve 500. The vertical axis J is the Lagrangian rate-distortion (RD) cost, and the horizontal axis C is the complexity relative to unconstrained motion estimation.
Intuitively, complex motion in detailed areas usually results in a large encoding cost for large blocks, and the cost converges to a lower level as the block size decreases. In contrast, areas with smooth motion quickly converge to a larger block size, in general. Also, we observed that there is a strong correlation between the J-C curves of the current macroblock and its temporally collocated macroblock. Therefore, the J-C curves of the current macroblock are estimated from the J-C curves of a collocated macroblock in the previous frame. The J-C slope mismatch problem from the estimation error is efficiently addressed through the partition-level budget adjustment and update.
The estimated J- C curve for each macroblock can have multiple slopes, which are maximally N_{max}^{itr}. To efficiently allocate the complexity budget to the frame, a greedy search is applied to the J-C curve to determine the complexity budget to allocate.
First, a minimum budget is allocated to each macroblock in the frame in order to prevent a block partition with a zero-value budget. A single point for each macroblock is selected as the minimum budget. Each point is assigned to a predicted motion vector for each macroblock. The predicted motion vector is the median vector of motion vectors in a spatial neighborhood of the macroblock. Based on the piecewise linear J-C curve, the initial budget is allocated for each macroblock using a greedy search until the budget is exhausted according to the following constraints.
Maximum Budget: If {dot over (B)}_{i}(j)≧{dot over (B)}_{i}(j−1), then the budget of each macroblock is determined by the maximum budget in the J-C curves of collocated macroblocks, and the remaining {dot over (B)}_{i}(j)−{dot over (B)}_{i}(j−1) is redistributed based on Equation (5) and the adjusted minimum complexity budget.
J-C Curve Approximation: Each macroblock in the previous frame can have n_{k}≦N_{max}^{itr }iterations so it can have n_{k }slopes according to
where k=[1, . . . , K], x=[1, . . . , n_{k}] and k, x and K are the macroblock index, the iteration index, ad the total number of macroblocks in the frame, respectively. Each slope indicates a potential improvement in the RD coding gain at the iteration.
Convex hull of the J-C Curve: A convex hull is constructed for each macroblock by using a O(n log n) process as described by K. L. Clarkson and P. W. Shor, “Applications of Random Sampling in Computational Geometry, II,” Discrete and Computational Geometry, Vol. 4, No. 1, pp. 387-421, 1989, incorporated herein by reference.
Complexity Allocation: Because the estimated piecewise linear J-C curve is convex, a greedy search can be used to allocate searching points based on slope S_{j-1}^{MB=k}(x).
The remaining budget, which is the minimum budget subtracted from the total budget subtracted for each macroblock, is allocated as follows, as shown in FIG. 3:
Next, the minimum complexity budgets are adjusted as follows. We perform motion estimation for each macroblock according to the initially assigned minimum complexity budget, where the J-C slope {tilde over (S)}_{j}^{MB=k}(X) is measured at each iteration in the current frame. Even though temporally adjacent J-C curves are correlated, it is important to note that the J-C curve of the current macroblock can have different slopes with the collocated macroblock that was used for the initial allocation. Also, the J-C slope {tilde over (S)}_{j}^{MB=k}(X) of the current macroblock is available when motion estimation is completed for a current iteration. In order to dynamically compensate the impact of J-C curve mismatch, a small number of searching points up to ε for additional iterations are conditionally allowed:
{dot over (C)}_{j}^{MB=k}=C_{j}^{MB=k}±ΔC_{j}^{MB=k}.
The amount of extra searching points ∓ΔC_{j}^{MB=k }is added to the macroblock budget for the remaining macroblocks. Intuitively, extra searching points ΔC_{j}^{MB=k }is a sum of increment or decrement under macroblock levels which can be assigned at block size level as long as the new slope is greater than the previous slope S_{j}^{MB=k}(x+1)≧S_{j}^{MB=k}(X), x≧n_{k }until the pre-defined threshold |ΔC_{j}^{MB=k}|<ε is met.
After the motion estimation of current macroblock is completed, the complexity budget is updated. If all the macroblocks are encoded with the initial frame budget plus a variable amount of additional budget B_{i}(j)±ΔB, then ∓ΔB is added to the frame budget for remaining frames.
Complexity Control for Block Partition Level
The complexity budget is allocated to the macroblocks as follows. For the k^{th }macroblock, the macroblock level complexity budget at frame j, C_{j}^{MB=k }is allocated from the most probable block partition to the least probable one. The block partition is sorted according to a priority that is based on the estimated RD cost.
A fast inter mode decision process can be used for sorting the block partition, Qionghai Dai, Dongdong Zhu, Rong Ding, “Fast mode decision for inter prediction in H.264,” International Conference on Image Processing, Vol. 1, pp. 119-122, October 2004, incorporated herein by reference.
At the end of encoding each block partition, the initial budget for the remaining block partitions C_{k}^{BP=l′}, l<l′≦L are updated based on the complexity budget C_{k}^{BP=l }allocated to previously encoded block partitions. If all block partitions are encoded with the initial macroblock budget plus a variable amount of additional budget C_{j}^{MB=k}±ΔC_{j}^{MB=k}, then ∓ΔC_{j}^{MB=k }is added to the macroblock budget for the remaining macroblocks. One embodiment of the invention, the encoder can allocate the macroblock budget to each block partition uniformly.
Complexity Control for Block Level
FIG. 4 shows the complexity control for the block level at the l^{th }block partition. The complexity budget for the block partition level of the macroblock k, C_{k}^{BP=l}, is allocated from the most probable block to the least probable one. The blocks are sorted according to a priority based on the estimated RD cost. The motion search is performed across multiple reference frames as long as the allocated block level budget C_{l}^{BL=m }is allowed.
At the end of encoding each block, the initial budgets for the remaining blocks C_{l}^{BL=m′}, m<m≦M are updated based on the complexity budget C_{l}^{BL=m }allocated to previously encoded blocks. If all the blocks are encoded with the initial block budget plus a variable amount of additional budget C_{k}^{BL=m}±ΔC_{k}^{BL=m }then ∓ΔC_{k}^{BL=m }is added to the block partition budget for the remaining block partitions. For simplicity, the encoder can allocate the block partition budget to each block uniformly.
FIG. 4 shows the iterations of the method for controlling a computational complexity of motion estimation while encoding a video in greater detail for each of the hierarchical levels. For each GOP 410, frame 420, macroblock 430, block partition 440, and block 450 perform the steps of allocating the frame 411, macroblock 421, block partition 431, block 441, and motion estimation 451, respectively. Then, at the end of the GOP 412, frame 422, macroblock 432, block partition 442, and block 452, update the GOP 413, frame 423, macroblock 433, block partition 443, and block 453 budgets, respectively until the last GOP 414, frame 424, macroblock 434, block partition 444, and block 454, respectively.
The block partition or set of block partitions for a given macroblock can be decided based on a given complexity budget for a macroblock.
It is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.