Andrey Norkin

Speeding up VP9 encoder with fully convolutional neural network

A new application of neural networks is a learned image compression where a network is used as an image/video compression engine or a restoration filter (see CLIC workshop at CVPR 2019). These approaches are often characterized by significantly longer decoding time compared to traditional image and video codecs.

However, neural networks can also be used for making video encoding faster. In collaboration with Somdyuti Paul and Prof. Alan Bovik from UT Austin, we have published an arXiv article on speeding up a VP9 encoder by using a Hierarchical Fully Convolutional Network (H-FCN).

Our approach allows to speed up VP9 intra-frame encoding by more than three times at the expense of 1.71% higher bitrate at the same quality. The neural network has been trained on a variety of Netflix movie and TV-content. The article and GitHub repository contain more details including the CNN training code and the modified VP9 encoder.

Next steps are extending the approach for inter-frame prediction and including perceptual quality metrics for improving coded video quality.