Multi-Prior Driven Resolution Rescaling Blocks for Intra Frame Coding

Deep learning techniques are increasingly integrated into rescaling-based video compression frameworks and have shown great potential in improving compression efficiency. However, existing methods achieve limited performance because 1) they treat context priors generated by codec as independent sour...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia pp. 1 - 16
Main Authors:	Wu, Peiying, Wang, Shiwei, Shen, Liquan, Wang, Feifeng, Tian, Zhaoyi, Hua, Xia
Format:	Journal Article
Language:	English
Published:	IEEE 30-08-2024
Subjects:	Complexity theory Encoding Image coding Image reconstruction intra frame coding multi-prior driven Rate-distortion Resolution rescaling Termination of employment Video coding video compression
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deep learning techniques are increasingly integrated into rescaling-based video compression frameworks and have shown great potential in improving compression efficiency. However, existing methods achieve limited performance because 1) they treat context priors generated by codec as independent sources of information, ignoring potential interactions between multiple priors in rescaling, which may not effectively facilitate compression; 2) they often employ a uniform sampling ratio across regions with varying content complexities, resulting in the loss of important information. To address the above two issues, this paper proposes a spatial multi-prior driven resolution rescaling framework for intra-frame coding, called MP-RRF, consisting of three sub-networks: a multi-prior driven network, a downscaling network, and an upscaling network. First, the multi-prior driven network employs complexity and similarity priors to smooth the unnecessarily complicated information while leveraging similarity and quality priors to produce high-fidelity complementary information. This interaction of complexity, similarity and quality priors ensures redundancy reduction and texture enhancement. Second, the downscaling network discriminatively processes components of different granularities to generate a compact, low-resolution image for encoding. The upscaling network aggregates a complementary set of contextual multi-scale features to reconstruct realistic details while combining variable receptive fields to suppress multi-scale compression artifacts and resampling noise. Extensive experiments show that our network achieves a significant 23.84% Bjøntegaard Delta Rate (BD-Rate) reduction under all-intra configuration compared to the codec anchor, offering the state-of-the-art coding performance.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2024.3453033