Video Super-Resolution Based on Spatial-Temporal Transformer

In this paper, we proposed a Spatial-Temporal Transformer (STTF) algorithm for video super resolution (SR), to solve the problem of blurs or artifacts after super resolve low-resolution (LR) video with traditional super resolution algorithm. Firstly, the algorithm uses residual blocks to extract ini...

Full description

Saved in:
Bibliographic Details
Published in:2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS) pp. 403 - 407
Main Authors: Zheng, Minyan, Luo, Jianping, Cao, Wenming
Format: Conference Proceeding
Language:English
Published: IEEE 07-11-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we proposed a Spatial-Temporal Transformer (STTF) algorithm for video super resolution (SR), to solve the problem of blurs or artifacts after super resolve low-resolution (LR) video with traditional super resolution algorithm. Firstly, the algorithm uses residual blocks to extract initial features from video sequences. Secondly, the three-dimensional video features are decomposed into image patches and then are sent to the Spatial-Temporal Transformer network for self-attention among patches where patches can be aligned and fused. Finally, sub-pixel convolution layer and residual layers are applied to up-sampling and reconstruct the high-resolution (HR) video sequences. In order to improve video visual effects, minimum mean square error (MSE) loss function is applied to train the neural network. The experimental results show that the STTF network has a higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) compared to traditional super-resolution algorithm.
DOI:10.1109/CCIS53392.2021.9754604