23 Mar The blocking distortion in broadcasting and media industry: why it matters?
Blocking artifacts is one of the most serious defects that could affect the quality of images and video streaming services, especially when the files are compressed to low bit rates using block discrete cosine transform (DCT)-based compression standards (e.g., JPEG, MPEG, and H.263).
The block-based discrete cosine transform (B-DCT) scheme is a fundamental component of all the previous images and videos. So, it is used in a wide range of applications because it takes advantage of the local spatial correlation property of the images by dividing the image into blocks of pixels (initially 8×8 in JPEG, MPEG-1, and MPEG-2, and now in varying block sizes to improve compression capacity). It means that the scheme is able to transform each block from the spatial domain to the frequency domain using the discrete cosine transform (DCT) and quantizing the DCT coefficients. Since blocks of pixels are treated as single entities and coded separately, correlation among spatially adjacent blocks is not taken into account in coding, which results in block boundaries being visible when the decoded image is reconstructed affecting the QoE of the audience. Some standards, such as H.264 and H.265, include some capacity to smooth the blocking effect, although in many situations the blocking is still visible.
How to decrease blocking
It is true that subjective picture quality can be significantly improved by decreasing the blocking artifacts. We could increase the bandwidth or bit rate to obtain better quality images. But this is not a perfect solution, because is often not possible or because we incur in high cost to achieve our objective. There are other approaches to improve the subjective quality of the degraded images, such as deblocking filters included in the encoders, with a limited capacity to smooth the distortion.
Actually, most of the approaches are based on models of the human visual system. However, the human visual system is extremely complex, and many of its properties are not well understood even today.
On the other hand, metrics need not necessarily rely on general models of the human visual system; they can exploit a priori knowledge about the compression and transmission methods as well as the pertinent types of artifacts using ad-hoc techniques or simple specialized vision models.
Among other kinds of alternatives, those techniques which do not require changes to existing standards appear to be the most practical solution. Even more, with the fast increase of available computing power, more sophisticated methods can be implemented. The goal is to significantly reduce the blocking effect because it would also lead to a higher compression ratio. Anyway, implementing these models could lead to a high computationally cost due to their complexity.
As we previously explained, the major source of distortion in image/video is due to the block DCT-based compression. The most popular and widely used image format (both in the Internet and in video streaming and IPTV) is JPEG for still images, and H.264 for video. Because JPEG and H.264 use block-based DCT transform for coding to achieve compression, the major artifact the compressed images suffer is blockiness. In JPEG coding, non-overlapping 8×8 pixel blocks are coded independently using DCT transform. The compression (bit-rate) and Image quality are mainly determined by the degree of quantization of these DCT coefficients. The effects of quantization manifest as blockiness, ringing, and blurring artifacts in the JPEG coded image. The subjective data for all these artifacts are highly correlated. Hence measuring the blockiness in-turn indicates the overall image quality.
Algorithms to measure blockiness have used a variety of methods to do so. The general idea behind these metrics was to temper the block-edge gradient with the masking activity measured around it. These approaches utilize the fact that the gradient at a block-edge can be masked by more spatially active areas around it or, in very dark or bright regions. Several of these approaches have proven to be quite effective but can be computationally quite complex for real-time implementation. In the case of H.264 for video, the approach is similar, applied to independent frames.
These algorithms can be generally grouped into two major categories. One is to use different encoding schemes, such as the interleaved block transform, the lapped transform, and the combined transform. The other is to postprocess the reconstructed images.
Since the postprocessing techniques do not require changes to the existing standards, they are more practical solutions. A popular de-blocking method is the projection onto convex sets (POCS). The basic idea is to represent every known property of an original image by a closed convex set. Since the original image is located within the intersection of all the closed convex sets, the goal of this iterative algorithm is to find an image that is also located within this intersection by alternating projections onto each closed convex set. Some other de-blocking algorithms are statistics-based. Some of them are using the maximum a posteriori probability (MAP) of the desired image given the decompressed image. Others converted the blocking artifacts into some penalty using a cost function. Then, the task of de-blocking becomes how to minimize the cost function. The major disadvantage of these methods is their computational complexity. One application area of de-blocking algorithms is real-time low-bit-rate video decoding, where noniterative de-blocking algorithms with low computational complexity would be desirable.
In addition, it is well known that the common image quality metrics, such as the mean square error (MSE) and peak signal-to-noise ratio (PSNR), have a poor predictive value on the degree of blocking artifacts. Thus, that is why Video-MOS has established a new metric that is more robust and can better correlate blocking artifacts with its value.