Email Spam Classification Based on Logistics Regression
Downloads
Email is a common communication tool used by both individuals and organizations. It involves a variety of interactions, including file sharing. In addition to the advantages it provides, there is the uninvited email sharing. This unsolicited email is referred to as "spam." Malicious content like viruses, phishing scams, and unsolicited ads can be found in spam. It is well known that communication security is crucial. In order to filter email systems for malicious tools or software, it is essential to classify them based on a variety of criteria. In these kinds of classification studies, machine learning algorithms work well. The objective of this research is to solve the problem at hand and compare the logistic regression, random forest, naive Bayes decision tree, and support vector machine (SVM) algorithms. The effects of various methods and approaches on the issue were thoroughly examined. A comparison of the various performance outcomes using the various approaches is provided. With an accuracy of 98%, logistic regression was the most accurate, followed by random forest with 97%.
Zavrak, S., & Yilmaz, S. (2023). Email spam detection using hierarchical attention hybrid deep learning method. Expert Systems with Applications, 233, 120977.
https://doi.org/10.1016/j.eswa.2023.120977
Alsuwit, M. H., Haq, M. A., & Aleisa, M. A. (2024). Advancing email spam classification using machine learning and deep learning techniques. Engineering, Technology & Applied Science Research, 14(4), 14994–15001.
https://doi.org/10.48084/etasr.7631 etasr.com
Alshawi, B., Munsh, A., Alotaibi, M., Alturki, R., & Allheeib, N. (2024). Classification of SPAM mail utilizing machine learning and deep learning techniques. International Journal on Information Technologies and Security, 16(2), 71–82.
https://doi.org/10.59035/FPKO7430
Labonne, M., & Moran, S. (2023). Spam-T5: Benchmarking large language models for few-shot email spam detection. arXiv preprint arXiv:2304.01238. https://arxiv.org/abs/2304.01238
Janez-Martino, F., Alaiz-Rodriguez, R., Gonzalez-Castro, V., Fidalgo, E., & Alegre, E. (2024). Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach. arXiv preprint arXiv:2402.05296.
https://arxiv.org/abs/2402.05296
Zhang, J. (2024). Machine learning-based email spam filter. Innovation in Science and Technology, 3(3), 36–51.
https://www.paradigmpress.org/ist/article/view/1130
Qiao, Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering, 48, 161–167. https://www.ewadirect.com/proceedings/ace/article/view/10963 ewadirect.com
Occhipinti, A., Rogers, L., & Angione, C. (2022). A pipeline and comparative study of 12 machine learning models for text classification. arXiv preprintarXiv:2204.06518. https://arxiv.org/abs/2204.06518
Bhowmick, A., & Hazarika, S. M. (2016). Machine learning for e-mail spam filtering: Review, techniques and trends. arXiv preprint arXiv:1606.01042. https://arxiv.org/abs/1606.01042
Zavrak, S., & Yilmaz, S. (2023). Email spam detection using hierarchical attention hybrid deep learning method. Expert Systems with Applications, 233, 120977.
https://doi.org/10.1016/j.eswa.2023.120977
Labonne, M., & Moran, S. (2023). Spam-T5: Benchmarking large language models for few-shot email spam detection. arXiv preprint arXiv:2304.01238. https://arxiv.org/abs/2304.01238
Janez-Martino, F., Alaiz-Rodriguez, R., Gonzalez-Castro, V., Fidalgo, E., & Alegre, E. (2024). Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach. arXiv preprint arXiv:2402.05296.
https://arxiv.org/abs/2402.05296
Qiao, Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering, 48, 161–167.
https://www.ewadirect.com/proceedings/ace/article/view/10963 ewadirect.com
Deng, X. (2023). Email Spam Filtering Methods: Comparison and Analysis. Humanities and Social Sciences Research, 8(4).
https://drpress.org/ojs/index.php/HSET/article/view/5805
Jazzar, M. H., Aljarah, I., & Aldwairi, M. (2021). Spam Email Detection Using Machine Learning Algorithms. International Journal of Education and Management Engineering, 11(4), 35–43.
https://www.mecs-press.org/ijeme/ijeme-v11-n4/v11n4-4.html
Li, J. (2023). A Comparison of Three Algorithms in Spam Classification. Humanities and Social Sciences Research, 8(3).
https://drpress.org/ojs/index.php/HSET/article/view/5436
Occhipinti, G., Trovato, M., & Alessi, L. (2022). Comparative Study of Text Classification Algorithms for Spam Detection. arXiv preprint arXiv:2204.06518. https://arxiv.org/abs/2204.06518
Ouyang, S., Lin, H., & Zhao, Y. (2023). Spam Filtering Based on KNN and Naive Bayes Classifier. Humanities and Social Sciences Research, 8(4). https://drpress.org/ojs/index.php/HSET/article/view/5699
Sutta, R., Agrawal, S., & Mandal, J. K. (2020). A Study of Machine Learning Algorithms on Email Spam Classification. EasyChair Preprint.
https://easychair.org/publications/paper/Jvsw
Zhang, M. (2024). Comparative Evaluation of Email Spam Classifiers Using Machine Learning. Information Science and Technology, 14(1).
https://www.paradigmpress.org/ist/article/view/1130Biswas, B. (n.d.). Email spam
Biswas, B. (2020) Email spam classification dataset CSV, Kaggle. Available at:
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv (Accessed: 16 April 2025).
Hassan, M. M., & Amiri, N. (2019, October 11-12). Classification of imbalanced data of diabetes disease using machine learning algorithms [Paper presentation]. IV. International Conference on Theoretical and Applied Computer Science and Engineering (ICTACSE), Istanbul, Turkey.
Chen, Z., Liu, Y., & Tie, N. (2023). Forest land resource information acquisition with Sentinel-2 image utilizing support vector machine, K-nearest neighbor, random forest, decision trees and multi-layer perceptron. Forests, 14(2), 254.
https://doi.org/10.3390/f14020254.
Boateng, E. Y., Otoo, J., & Abaye, D. A. (2020). Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. Journal of Data Analysis and Information Processing, 8(4), Article 20. https://doi.org/10.4236/jdaip.2020.84020