COVSR

¹ University of Queensland
²Auckland University of Technology
³Intel
⁴Beijing Jiaotong University
⁵AI^2 Robotics

The demonstration video involves Our COVSR method and 4x bicubic interpolation, and is compressed for fast loading.
You can select different constant rate factors for different scenes within two testsets.

Constant Rate Factor

Testset

REDS4
VID4

Scene

Abstract

Current compressed video super-resolution methods have achieved promising performance but they often assume that an input video is compressed under low-delay configurations. However, under random access configurations, those methods might struggle to leverage the metadata effectively due to the large variations of metadata in different compression configurations.

t-pic

Comparison of our method and the state-of-the-art compressed video super-resolution method (CAVSR). (a) At time step i, metadata is generated by a current frame x_i and a reference frame x_i−α, where the value of α is decided by the codec process. (b) Since α is not always equal to 1, using mismatched metadata (motion vector m_i-α->i in CAVSR to align the previous adjacent feature f_i-1 will cause misalignment. (c) Some visualizations of misalignment in CAVSR. (d) Our alignment method accurately models the moving car and retains better contours thanks to the effective use of metadata.

In this work, we propose a general Compression-Omniscient Video Super-Resolution (COVSR) method that can address video super-resolution for both low-delay and random-access configurations. Specifically, we first introduce an efficient compression-aware propagation (ECAP) module that dynamically adjusts propagation routes in accordance with the compression configurations. Since existing methods require reconstructing frames in a frame-by-frame manner, it is difficult to achieve efficient parallelization. However, we found that by slightly sacrificing temporal dependencies, our ECAP can significantly improve inference speed. Furthermore, considering that ECAP may bring challenges in cross-frame alignment, we designed a metadata-driven alignment (MDA) module to refine the motion vectors rather than calculating cross-frame offsets from scratch. MDA first transforms motion vectors into coarse optical flows and then iteratively refines them over several scales into dense feature-level optical flows. In this way, MDA significantly improves the quality of motion vectors while achieving faster alignment speed by exploiting metadata. Extensive experimental results demonstrate that our COVSR not only achieves efficient and superior super-resolution performance but also is generalizable to various compression configurations. Our code will be available.

Method

The architecture of our Compression-Omniscient Video Super-Resolution (COVSR) is shown in Fig. 2. COVSR aims to restore the high-resolution (HR) frames from low-resolution (LR) frames with the assistance of metadata, such as frame types and block sizes. Specifically, COVSR consists of four modules: feature extraction module, Efficient Compression-Aware Propagation (ECAP), Metadata-Driven Alignment (MDA) module, and upsampling module.
Let x_i denotes the current frame at time step i, and x_i-α, x_i+β represent its two reference frames. In order to generate a high-resolution frame, x_i undergoes different states to extract rich features, i.e., f_i^j represents the feature at the j-th (j=1,2,3) state. The high-resolution frame of x_i can be generated using the following steps:

The shallow feature f_i¹ of the frame x_i is obtained from the feature extraction module.
MDA first refines motion vectors between the reference frames and the current frame to obtain accurate feature-level optical flows o_{i-α → i} and o_{i+β → i}. Then, aligned reference states f_{i-α → i}^j, f_{i+β → i}^j are obtained by optical flow and warping operations.
The aligned reference states are fed into the feature fusion blocks along with the previous state f_i^j-1 to generate the aggregation feature f_i^j. The f_i^j will then be propagated as a state in ECAP.
The final feature f_i³ is derived after propagation. We input it to the upsampling module to generate the high-resolution frame.

Details of the efficient compression-aware propagation and the metadata-driven alignment module are described in Sec. 3 and Sec. 4, respectively.

COVSR: Compression-Omniscient Video Super-Resolution

Constant Rate Factor

Testset

Scene

Abstract

Method