Combining content and social features in a deep learning approach to Vietnamese email prioritization

Ha Thanh Nguyen, Quan Dinh Dang, Anh Quang Tran

Abstract


The email overload problem has been discussed in numerous email-related studies. One of the possible solutions to this problem is email prioritization, which is the act of automatically predicting the importance levels of received emails and sorting the user’s inbox accordingly. Several learning-based methods have been proposed to address the email prioritization problem using content features as well as social features. Although these methods have laid the foundation works in this field of study, the reported performance is far from being practical. Recent works on deep neural networks have achieved good results in various tasks. In this paper, the authors propose a novel email prioritization model which incorporates several deep learning techniques and uses a combination of both content features and social features from email data. This method targets Vietnamese emails and is tested against a self-built Vietnamese email corpus. Conducted experiments explored the effects of different model configurations and compared the effectiveness of the new method to that of a previous work.

Full Text:

PDF

References


Thanh, H. N., Dinh, Q. D., & Anh-Tran, Q. (2017). Personalized Email User Action Prediction Based on SpamAssassin. In Cong Vinh P., Tuan Anh L., Loan N., Vongdoiwang Siricharoen W. (eds) Context-Aware Systems and Applications. ICCASA 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (Vol. 193). Springer, Cham. https://doi.org/10.1007/978-3-319-56357-2_17

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. https://arxiv.org/abs/1301.3781

Vu, T., Nguyen, D. Q., Dras, M., & Johnson, M. (2018, June). VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. 56-60). https://arxiv.org/abs/1801.01331

Yin, Z., & Shen, Y. (2018, December). On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 895-906). https://arxiv.org/abs/1812.04224

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958. https://dl.acm.org/doi/10.5555/2627435.2670313

Riedmiller, M., & Braun, H. (1993, March). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In IEEE international conference on neural networks (pp. 586-591). IEEE. https://ieeexplore.ieee.org/document/298623

Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Retrieved from http://www.cs.toronto.edu/~hinton/coursera/lecture6/lec6.pdf

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://arxiv.org/abs/1412.6980

Thanh, H. N., Dinh, Q. D., & Tran, Q. A. (2018, November). Predicting user’s action on emails: improvement with ham rules and real-world dataset. In 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 169-174). IEEE. https://ieeexplore.ieee.org/abstract/document/8573330/

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7). Retrieved from https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

Boykin, P. O., & Roychowdhury, V. P. (2005). Leveraging social networks to fight spam. Computer, 38(4), 61-68. https://ieeexplore.ieee.org/abstract/document/1432647/

Yoo, S., Yang, Y., Lin, F., & Moon, I. C. (2009, June). Mining social networks for personalized email prioritization. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 967-976). https://dl.acm.org/doi/abs/10.1145/1557019.1557124

Yoo, S., Yang, Y., & Carbonell, J. (2011, October). Modeling personalized email prioritization: classification-based and regression-based approaches. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 729-738). https://dl.acm.org/doi/abs/10.1145/2063576.2063683

Mi, G., Gao, Y., & Tan, Y. (2015, June). Apply stacked auto-encoder to spam detection. In International Conference in Swarm Intelligence (pp. 3-15). Springer, Cham. https://link.springer.com/chapter/10.1007/978-3-319-20472-7_1

Covey, S. R. (2004). The 7 habits of highly effective people: Powerful lessons in personal change. Simon and Schuster. https://books.google.com/books/about/The_7_Habits_of_Highly_Effective_People.html?id=upUxaNWSaRIC

Seth, S., & Biswas, S. (2017, December). Multimodal spam classification using deep learning techniques. In 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (pp. 346-349). IEEE. https://ieeexplore.ieee.org/abstract/document/8334769/

Yawen, W., Fan, Y., & Yanxi, W. (2018, April). Research of email classification based on deep neural network. In 2018 Second International Conference of Sensor Network and Computer Engineering (ICSNCE 2018) (pp. 73-77). Atlantis Press. https://www.atlantis-press.com/proceedings/icsnce-18/25894523

Kooti, F., Aiello, L. M., Grbovic, M., Lerman, K., & Mantrach, A. (2015, May). Evolution of conversations in the age of email overload. In Proceedings of the 24th international conference on world wide web (pp. 603-613). https://dl.acm.org/doi/abs/10.1145/2736277.2741130

Jain, G., Sharma, M., & Agarwal, B. (2019). Optimizing semantic LSTM for spam detection. International Journal of Information Technology, 11(2), 239-250. https://link.springer.com/article/10.1007/s41870-018-0157-5

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444. https://www.nature.com/articles/nature14539

Dabbish, L. A., & Kraut, R. E. (2006, November). Email overload at work: An analysis of factors associated with email strain. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 431-440). https://dl.acm.org/doi/abs/10.1145/1180875.1180941

Amazon Web Services, Inc. (n.d.). Amazon EC2 Instance Types. Amazon Web Services (AWS). Retrieved May 26, 2021, from https://aws.amazon.com/ec2/instance-types

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437. https://www.sciencedirect.com/science/article/abs/pii/S0306457309000259




DOI: http://dx.doi.org/10.21553/rev-jec.271

Copyright (c) 2021 REV Journal on Electronics and Communications


Copyright © 2011-2021
Radio and Electronics Association of Vietnam
All rights reserved