Deep Reinforcement Learning based Bitrate Adaptations in Dynamic Adaptive Streaming over HTTP

Long Minh Luu, Nghia Trung Nguyen, Phuong Luu Vo, Tuan-Anh Le

Abstract


Dynamic adaptive streaming over HTTP (DASH) has been a superior video streaming technology in recent years. Bitrate adaptation function at video player plays a vital role in guaranteeing a high quality-of-experience for the users. This work evaluates the performance of several advanced deep reinforcement learning algorithms, \textit{i.e.}, deep Q-learning, actor-critic, and proximal policy optimization, applied in bitrate adaptations and compares them with other rate adaptation methods with real-trace datasets.

References


Cisco, Cisco Visual Networking Index: Forecast and Methodology 2015–2020, 2016.

H. Mao, R. Netravali, and M. Alizadeh, “Neural Adaptive Video Streaming with Pensieve,” in Proceedings Of The Conference Of The ACM Special Interest Group On Data Communication, 2017, pp. 197–210.

T. Stockhammer, “Dynamic adaptive streaming over HTTP – standards and design principles,” in Proceedings Of The Second Annual ACM Conference On Multimedia Systems, 2011, pp. 133–144.

K. Spiteri, R. Urgaonkar, and R. Sitaraman, “BOLA: NearOptimal Bitrate Adaptation for Online Videos,” IEEE/ACM Transactions On Networking, vol.28, no. 4, pp. 1698–1711, 2020.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

V. Mnih, A. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning,” in Proceedings Of The 33rd International Conference On Machine Learning (PMLR), 2016, pp. 1928–1937.

T. Huang, N. Handigol, B. Heller, N. McKeown and R. Johari, “Confused, Timid, and Unstable: Picking a Video Streaming Rate is Hard,” in Proceedings Of The 2012 Internet Measurement Conference, 2012, pp. 225–238.

X. Zou, J. Erman, V. Gopalakrishnan, E. Halepovic, R. Jana, X. Jin, J. Rexford, and R. Sinha, “Can Accurate Predictions Improve Video Streaming in Cellular Networks?,” in Proceedings Of The 16th International Workshop On Mobile Computing Systems And Applications, 2015, pp. 57-62.

X. Yin, A. Jindal, V. Sekar and B. Sinopoli, “A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP,” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, 2015, pp. 325–338.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv preprint arXiv:1707.06347, 2017.

M. Gadaleta, F. Chiariotti, M. Rossi and A. Zanella, “D-DASH: A Deep Q-Learning Framework for DASH Video Streaming,” IEEE Transactions On Cognitive Communications And Networking, vol. 3, pp. 703–718, 2017.

A. Raffin, A. Hill, M. Ernestus, A. Gleave, A. Kanervisto and N. Dormann, “Stable Baselines3,” GitHub Repository, 2019, https://github.com/DLR-RM/stable-baselines3.

Blender Elephants Dream Movie, 2014, https://orange.blender.org/.

D. Raca, J. Quinlan, A. Zahran and C. Sreenan, “Beyond Throughput: A 4G LTE Dataset with Channel and Context Metrics,” in Proceedings Of The 9th ACM Multimedia Systems Conference, 2018, pp. 460–465.

FCC, The Tenth Measuring Broadband America Fixed Broadband Report: A Report on Consumer Fixed Broadband Performance in the United States, 2019.

C. Colas, O. Sigaud and P. Oudeyer, “A Hitchhiker’s Guide to Statistical Comparisons of Reinforcement Learning Algorithms,” arXiv preprint arXiv:1904.06979, 2019.

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup and D. Meger, “Deep Reinforcement Learning That Matters,” in Proceedings of the AAAI conference on artificial intelligence, 2018, vol. 32, no. 1.

J. Schulman, S. Levine, P. Abbeel, M. Jordan and P. Moritz, “Trust Region Policy Optimization,” in Proceedings Of The 32nd International Conference On Machine Learning, 2015, vol. 37, pp. 1889–1897.

L. Biewald, “Experiment Tracking with Weights and Biases,” 2020, https://www.wandb.com/.

D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in Proceedings of the 3rd International Conference On Learning Representations, ICLR, 2015. ArXiv preprint: arXiv:1412.6980.

G. Hinton and T. Tieleman, Neural networks for machine learning class, Coursera, 2012.

Google, “Choose live encoder settings, bitrates, and resolutions,” YouTube help, 2021 https://support.google.com/youtube/answer/2853702.

R. Islam, P. Henderson, M. Gomrokchi and D. Precup, “Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control,” in Proceedings of Reproducibility In Machine Learning Workshop (ICML), 2017.

L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph and A. Madry, “Implementation Matters in Deep RL: A Case Study on PPO and TRPO,” in Proceedings of International Conference On Learning Representations, 2020.

https://wandb.ai/aeryss/singlepath-final

https://wandb.ai/aeryss/singlepath-tuning




DOI: http://dx.doi.org/10.21553/rev-jec.308

Copyright (c) 2022 REV Journal on Electronics and Communications


Copyright © 2011-2022
Radio and Electronics Association of Vietnam
All rights reserved