Applying Random Forest Algorithm for Phishing URL Identification

Afthar Kautsar; Maghfira  Aida; Anita  Yulistia

doi:10.56427/jcbd.v4i3.782

Authors

Afthar Kautsar Universitas Islam Negeri Sumatera Utara
Maghfira Aida Universitas Islam Negeri Sumatera Utara
Anita Yulistia Universitas Islam Negeri Sumatera Utara

DOI:

https://doi.org/10.56427/jcbd.v4i3.782

Keywords:

Phishing, URL Detection, Random Forest, Machine Learning, Cybersecurity

Abstract

Phishing attacks continue to be one of the most pervasive cybersecurity threats, particularly through malicious URLs designed to mimic legitimate websites and steal sensitive user information. To address this challenge, this study employs the Random Forest algorithm for automated phishing URL detection using a publicly available dataset from Kaggle. The dataset contains diverse structural, technical, and popularity-based features that capture behavioral and lexical characteristics of each URL. Following data preprocessing and an 80/20 train–test split, the Random Forest classifier achieved strong predictive performance, attaining an accuracy of 94.94%, a precision of 95.19%, and a recall of 96.94%. The model further demonstrated robust classification capability with an F1-score of 96.06% and an ROC AUC value of 0.985, indicating excellent discrimination between phishing and legitimate URLs. Feature importance analysis shows that factors such as the URL’s presence in Google’s index, page rank metrics, and specific structural patterns significantly influence prediction outcomes. Additionally, performance visualizations including ROC and Precision–Recall curves reinforce the model’s reliability and stability. Overall, the findings suggest that Random Forest provides an effective and efficient solution for phishing URL detection, offering promising potential for integration into real-world cybersecurity systems.

Downloads

Download data is not yet available.

References

M. Fahri, “Implementation of the Random Forest Algorithm for Phishing Detection on Websites,” JITSI J. Ilm. Teknol. Sist. Inf., vol. 6, no. 2, pp. 186–194, 2025.

E. I. E. Asker, N. S., & Essa, “An empirical study on feature importance and model performance for phishing website detection using a random forest classifier,” Int. J. Commun. Inf. Technol., vol. 5, no. 2, pp. 1–10, 2024.

D. Tampinongkol, F. F., Kamila, A. R., Wardhana, A. C., Kusuma, A. W. C., & Revaldo, “Implementation of Random Forest Classification and Support Vector Machine Algorithms for Phishing Link Detection,” INISTA J. ISS Softw. Eng., vol. 7, no. 1, pp. 127–137, 2024.

H. Foozy, C. F. M., Anuar, M. A. I., Maslan, A., Adam, H. A. M., & Mahdin, “Phishing URLs Detection Using Naive Bayes, Random Forest and LightGBM Algorithms,” Int. J. Data Sci., vol. 5, no. 1, pp. 56–63, 2024.

R. Gürfidan, “Intelligent methods in cyber defence: machine learning based phishing attack detection on web pages,” J. Eng. Sci. Des., vol. 12, no. 2, pp. 416–429, 2024.

D. R. I. M. Sarasjati, W., Rustad, S., Purwanto, H. A. S., & Setiadi, “Phishing detection using Random Forest‑based weighted bootstrap sampling and LASSO+ feature selection,” IETA/IIETA IJSSE, vol. Dec 2024, 2024.

S. Pangemanan, C. V., & Nurafni, “Optimalisasi akurasi deteksi URL phishing dengan hyperparameter‑tuning RFECV dan grid search pada algoritma Random Forest,” Data Enthus. J., vol. 1, no. 1, pp. 17–24, 2024.

K.-K. Daniel, M. A., Chong, S.-C., Chong, L.-Y., & Wee, “Optimising phishing detection: A comparative analysis of machine learning methods with feature selection,” J. Informatics Web Eng., vol. 4, no. 1, pp. 200–212, 2025.

X. Yang, R., Zheng, K., Wu, B., Wu, C., & Wang, “Phishing website detection based on deep convolutional neural network and Random Forest ensemble learning,” Sensors, vol. 21, no. 24, p. 8281, 2021.

M. A. Islam Ovi, M. S., Rahman, M. H., & Hossain, “PhishGuard: a multi‑layered ensemble model for optimal phishing website detection (Random Forest, XGBoost, CatBoost). *arXiv preprint*,” arXiv Prepr., 2024.

S. Baskota, “Phishing URL detection using Bi‑LSTM.,” arXiv preprin, 2025.

R. Q. Guo, W., Wang, Q., Yue, H., Sun, H., & Hu, “Efficient phishing URL detection using graph‑based ML and Loopy Belief Propagation,” arXiv Prepr., 2025.

H. Kresna Kencana, A., Ananda, F. D., Hartanto, A. D., & Hartatik, “Implementasi metode Random Forest klasifikasi untuk phishing link detection,” Intechno J. IT, vol. 4(2), Dec, 2022.

S. Prasad, A., & Chandra, “PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning,” vol. 136, 10354, 2024.