Decoder-ROI based Versatile Video Coding for Multi-Object Tracking Vision Task

Huong Thanh Bui, Do Ngoc Minh, Xiem Van Hoang


The video encoding standards High Efficiency Video Coding (HEVC) and, more recently, Versatile Video Coding (VVC) have introduced significant advancements in multimedia communication applications, such as video conferencing, broadcasting, and notably, E-learning. However, recent developments in artificial intelligence (AI) and big data have given rise to an urgent need for a specialized video encoding model designed specifically for image and video analysis applications using machine vision. In this paper, we propose a novel video encoding approach that effectively combines the ROI Coding algorithm and the VVC encoding model. The proposed method identifies regions of interest within video frames through fundamental and deep features. Based on this, we propose an adaptive compression method for each frame block, ensuring both the execution performance of machine learning applications and minimal data encoding requirements. To achieve new coding scheme without adding bitrate, New feature extraction approach are utilizing only decoded information (Decoder-ROI). The results demonstrate that the Decoder-ROI achieved significant compression rate improvement when compared to standard and relevant VCM schemes. Furthermore, ROI exploitation contributes to a 3.25\% reduction in encoding time compared to the baseline VVC encoding standard. 

Full Text:



T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h.264/avc video coding standard,” IEEE Transactions on Circuits

and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003.

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.

B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-

R. Ohm, “Overview of the versatile video coding (vvc) standard and

its applications,” IEEE Transactions on Circuits and Systems for Video

Technology, vol. 31, no. 10, pp. 3736–3764, 2021.

X. HoangVan, L. Dao Thi Hue, and T. Nguyen Canh, “A trellis based temporal rate allocation and virtual reference frames for high efficiency video coding,” Electronics, vol. 10, no. 12, 2021. [Online]. Available:

X. HoangVan, S. NguyenQuang, M. DinhBao, M. DoNgoc, and

D. Trieu Duong, “Fast qtmt for h.266/vvc intra prediction using early-

terminated hierarchical cnn model,” in 2021 International Conference on Advanced Technologies for Communications (ATC), 2021, pp. 195–200.

X. HoangVan, S. NguyenQuang, and F. Pereira, “Versatile video coding based quality scalability with joint layer reference,” IEEE Signal

Processing Letters, vol. 27, pp. 2079–2083, 2020.

X. HoangVan, “Adaptive quantization parameter estimation for hevc based surveillance scalable video coding,” Electronics, vol. 9, no. 6, 2020. [Online]. Available:

Y. Zhang, M. Rafie, and S. Liu, “Use cases and requirements for video coding for machines,” ISO/IEC JTC, vol. 1, 2021.

Y. Zhang and P. Dong, “Mpeg-m49944: Report of the ahg on vcm,”

Moving Picture Experts Group (MPEG) of ISO/IEC JTC1/SC29/WG11,

Oct. 2019.

L. Duan, J. Liu, W. Yang, T. Huang, and W. Gao, “Video coding

for machines: A paradigm of collaborative compression and intelligent

analytics,” Trans. Img. Proc., vol. 29, p. 8680–8695, jan 2020. [Online].


D. Taubman and M. Marcellin, JPEG2000 Image Compression Fun-

damentals, Standards and Practice: Image Compression Fundamentals,

Standards and Practice. Springer Science Business Media, 2012, vol.

H. Meuel, J. Schmidt, M. Munderloh, and J. Ostermann, “Region of

interest coding for aerial video sequences using landscape models,”

in Advanced Video Coding for Next-Generation Multimedia Services,

Y.-S. Ho, Ed. Rijeka: IntechOpen, 2013, ch. 3. [Online]. Available:

W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, and T.-K. Kim, “Multiple object tracking: A literature review,” Artificial Intelligence, vol. 293, p.103448, 2021.

Z. Wang, L. Zheng, Y. Liu, and S. Wang, “Towards real-time multi-object tracking,” The European Conference on Computer Vision (ECCV), 2020.

C. L. Zitnick and P. Doll ́ar, “Edge boxes: Locating object proposals from edges,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 391–405.

M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, “Bing: Binarized

normed gradients for objectness estimation at 300fps,” in 2014 IEEE

Conference on Computer Vision and Pattern Recognition, 2014, pp.


G. Jocher, “YOLOv5 by Ultralytics,” May 2020. [Online]. Available:

F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, “Complexity analysis of next-generation vvc encoding and decoding,” in 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 3134–3138.

A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multi-object tracking,” 2016.

A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V. George,

G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “Vvenc: An open and optimized vvc encoder implementation,” in Proc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2.

K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, pp. 1–10, 2008.

G. Bjøntegaard, “Calculation of average psnr differences between rd-curves,” 2001. [Online]. Available:


Copyright (c) 2024 REV Journal on Electronics and Communications

ISSN: 1859-378X

Copyright © 2011-2024
Radio and Electronics Association of Vietnam
All rights reserved