Email Spam Classification Based on Logistics Regression

Iman Youssif Ibrahim; Omar Sedqi Kareem

doi:10.47191/etj/v10i05.14

Authors

Iman Youssif Ibrahim Akre University for Applied Science, Technical College of Informatics, Akre, Department of Information Technology, Akre-Duhok, Kurdistan Region, Iraq
Omar Sedqi Kareem Department of Public Health, College of Health and Medical Techniques – Shekhan , Duhok Polytechnic University, Duhok Kurdistan Region, Iraq

Vol. 10 No. 5 (2025): VOLUME 10 ISSUE 05

Articles

Published May 10, 2025

Downloads

PDF

Abstract
How to Cite
Metrics
References

Email is a common communication tool used by both individuals and organizations. It involves a variety of interactions, including file sharing. In addition to the advantages it provides, there is the uninvited email sharing. This unsolicited email is referred to as "spam." Malicious content like viruses, phishing scams, and unsolicited ads can be found in spam. It is well known that communication security is crucial. In order to filter email systems for malicious tools or software, it is essential to classify them based on a variety of criteria. In these kinds of classification studies, machine learning algorithms work well. The objective of this research is to solve the problem at hand and compare the logistic regression, random forest, naive Bayes decision tree, and support vector machine (SVM) algorithms. The effects of various methods and approaches on the issue were thoroughly examined. A comparison of the various performance outcomes using the various approaches is provided. With an accuracy of 98%, logistic regression was the most accurate, followed by random forest with 97%.

Zavrak, S., & Yilmaz, S. (2023). Email spam detection using hierarchical attention hybrid deep learning method. Expert Systems with Applications, 233, 120977.

https://doi.org/10.1016/j.eswa.2023.120977

Alsuwit, M. H., Haq, M. A., & Aleisa, M. A. (2024). Advancing email spam classification using machine learning and deep learning techniques. Engineering, Technology & Applied Science Research, 14(4), 14994–15001.

https://doi.org/10.48084/etasr.7631 etasr.com

Alshawi, B., Munsh, A., Alotaibi, M., Alturki, R., & Allheeib, N. (2024). Classification of SPAM mail utilizing machine learning and deep learning techniques. International Journal on Information Technologies and Security, 16(2), 71–82.

https://doi.org/10.59035/FPKO7430

Labonne, M., & Moran, S. (2023). Spam-T5: Benchmarking large language models for few-shot email spam detection. arXiv preprint arXiv:2304.01238. https://arxiv.org/abs/2304.01238

Janez-Martino, F., Alaiz-Rodriguez, R., Gonzalez-Castro, V., Fidalgo, E., & Alegre, E. (2024). Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach. arXiv preprint arXiv:2402.05296.

https://arxiv.org/abs/2402.05296

Zhang, J. (2024). Machine learning-based email spam filter. Innovation in Science and Technology, 3(3), 36–51.

https://www.paradigmpress.org/ist/article/view/1130

Qiao, Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering, 48, 161–167. https://www.ewadirect.com/proceedings/ace/article/view/10963 ewadirect.com

Occhipinti, A., Rogers, L., & Angione, C. (2022). A pipeline and comparative study of 12 machine learning models for text classification. arXiv preprintarXiv:2204.06518. https://arxiv.org/abs/2204.06518

Bhowmick, A., & Hazarika, S. M. (2016). Machine learning for e-mail spam filtering: Review, techniques and trends. arXiv preprint arXiv:1606.01042. https://arxiv.org/abs/1606.01042

Zavrak, S., & Yilmaz, S. (2023). Email spam detection using hierarchical attention hybrid deep learning method. Expert Systems with Applications, 233, 120977.

https://doi.org/10.1016/j.eswa.2023.120977

Labonne, M., & Moran, S. (2023). Spam-T5: Benchmarking large language models for few-shot email spam detection. arXiv preprint arXiv:2304.01238. https://arxiv.org/abs/2304.01238

Janez-Martino, F., Alaiz-Rodriguez, R., Gonzalez-Castro, V., Fidalgo, E., & Alegre, E. (2024). Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach. arXiv preprint arXiv:2402.05296.

https://arxiv.org/abs/2402.05296

Qiao, Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering, 48, 161–167.

https://www.ewadirect.com/proceedings/ace/article/view/10963 ewadirect.com

Deng, X. (2023). Email Spam Filtering Methods: Comparison and Analysis. Humanities and Social Sciences Research, 8(4).

https://drpress.org/ojs/index.php/HSET/article/view/5805

Jazzar, M. H., Aljarah, I., & Aldwairi, M. (2021). Spam Email Detection Using Machine Learning Algorithms. International Journal of Education and Management Engineering, 11(4), 35–43.

https://www.mecs-press.org/ijeme/ijeme-v11-n4/v11n4-4.html

Li, J. (2023). A Comparison of Three Algorithms in Spam Classification. Humanities and Social Sciences Research, 8(3).

https://drpress.org/ojs/index.php/HSET/article/view/5436

Occhipinti, G., Trovato, M., & Alessi, L. (2022). Comparative Study of Text Classification Algorithms for Spam Detection. arXiv preprint arXiv:2204.06518. https://arxiv.org/abs/2204.06518

Ouyang, S., Lin, H., & Zhao, Y. (2023). Spam Filtering Based on KNN and Naive Bayes Classifier. Humanities and Social Sciences Research, 8(4). https://drpress.org/ojs/index.php/HSET/article/view/5699

Sutta, R., Agrawal, S., & Mandal, J. K. (2020). A Study of Machine Learning Algorithms on Email Spam Classification. EasyChair Preprint.

https://easychair.org/publications/paper/Jvsw

Zhang, M. (2024). Comparative Evaluation of Email Spam Classifiers Using Machine Learning. Information Science and Technology, 14(1).

https://www.paradigmpress.org/ist/article/view/1130Biswas, B. (n.d.). Email spam

Biswas, B. (2020) Email spam classification dataset CSV, Kaggle. Available at:

https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv (Accessed: 16 April 2025).

Hassan, M. M., & Amiri, N. (2019, October 11-12). Classification of imbalanced data of diabetes disease using machine learning algorithms [Paper presentation]. IV. International Conference on Theoretical and Applied Computer Science and Engineering (ICTACSE), Istanbul, Turkey.

Chen, Z., Liu, Y., & Tie, N. (2023). Forest land resource information acquisition with Sentinel-2 image utilizing support vector machine, K-nearest neighbor, random forest, decision trees and multi-layer perceptron. Forests, 14(2), 254.

https://doi.org/10.3390/f14020254.

Boateng, E. Y., Otoo, J., & Abaye, D. A. (2020). Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. Journal of Data Analysis and Information Processing, 8(4), Article 20. https://doi.org/10.4236/jdaip.2020.84020

Email Spam Classification Based on Logistics Regression

Authors

Downloads

Make a Submission

author_desk

sidebarmenu

Current Issue

Information

Browse

Author Info.:

Contact Info: