Combining content and social features in a deep learning approach to Vietnamese email prioritization

Ha Thanh Nguyen, Quan Dinh Dang, Anh Quang Tran


The email overload problem has been discussed in numerous email-related studies. One of the possible solutions to this problem is email prioritization, which is the act of automatically predicting the importance levels of received emails and sorting the user’s inbox accordingly. Several learning-based methods have been proposed to address the email prioritization problem using content features as well as social features. Although these methods have laid the foundation works in this field of study, the reported performance is far from being practical. Recent works on deep neural networks have achieved good results in various tasks. In this paper, the authors propose a novel email prioritization model which incorporates several deep learning techniques and uses a combination of both content features and social features from email data. This method targets Vietnamese emails and is tested against a self-built Vietnamese email corpus. Conducted experiments explored the effects of different model configurations and compared the effectiveness of the new method to that of a previous work.

Full Text:



Thanh, H. N., Dinh, Q. D., & Anh-Tran, Q. (2017). Personalized Email User Action Prediction Based on SpamAssassin. In Cong Vinh P., Tuan Anh L., Loan N., Vongdoiwang Siricharoen W. (eds) Context-Aware Systems and Applications. ICCASA 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (Vol. 193). Springer, Cham.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Vu, T., Nguyen, D. Q., Dras, M., & Johnson, M. (2018, June). VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. 56-60).

Yin, Z., & Shen, Y. (2018, December). On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 895-906).

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.

Riedmiller, M., & Braun, H. (1993, March). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In IEEE international conference on neural networks (pp. 586-591). IEEE.

Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Retrieved from

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Thanh, H. N., Dinh, Q. D., & Tran, Q. A. (2018, November). Predicting user’s action on emails: improvement with ham rules and real-world dataset. In 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 169-174). IEEE.

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7). Retrieved from

Boykin, P. O., & Roychowdhury, V. P. (2005). Leveraging social networks to fight spam. Computer, 38(4), 61-68.

Yoo, S., Yang, Y., Lin, F., & Moon, I. C. (2009, June). Mining social networks for personalized email prioritization. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 967-976).

Yoo, S., Yang, Y., & Carbonell, J. (2011, October). Modeling personalized email prioritization: classification-based and regression-based approaches. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 729-738).

Mi, G., Gao, Y., & Tan, Y. (2015, June). Apply stacked auto-encoder to spam detection. In International Conference in Swarm Intelligence (pp. 3-15). Springer, Cham.

Covey, S. R. (2004). The 7 habits of highly effective people: Powerful lessons in personal change. Simon and Schuster.

Seth, S., & Biswas, S. (2017, December). Multimodal spam classification using deep learning techniques. In 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (pp. 346-349). IEEE.

Yawen, W., Fan, Y., & Yanxi, W. (2018, April). Research of email classification based on deep neural network. In 2018 Second International Conference of Sensor Network and Computer Engineering (ICSNCE 2018) (pp. 73-77). Atlantis Press.

Kooti, F., Aiello, L. M., Grbovic, M., Lerman, K., & Mantrach, A. (2015, May). Evolution of conversations in the age of email overload. In Proceedings of the 24th international conference on world wide web (pp. 603-613).

Jain, G., Sharma, M., & Agarwal, B. (2019). Optimizing semantic LSTM for spam detection. International Journal of Information Technology, 11(2), 239-250.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.

Dabbish, L. A., & Kraut, R. E. (2006, November). Email overload at work: An analysis of factors associated with email strain. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 431-440).

Amazon Web Services, Inc. (n.d.). Amazon EC2 Instance Types. Amazon Web Services (AWS). Retrieved May 26, 2021, from

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.


Copyright (c) 2021 REV Journal on Electronics and Communications

Copyright © 2011-2021
Radio and Electronics Association of Vietnam
All rights reserved