DETECTING PHISHING ATTACKS VIA HYBRID MACHINE LEARNING MODELS BASED ON URL ANALYSIS
Keywords:
Phishing-detection models, PILU-90K dataset, Logistic Regression, TF-IDF feature extraction, Cybersecurity, URL ClassificationAbstract
To show how the effectiveness of phishing detection models can decrease over time, we trained a baseline model using older datasets and tested it on new URLs. The results suggested declining accuracy; we then carried out an extensive analysis on current phishing domains in order to discover new trends and tactics used by attackers. Creation of a brand new dataset dubbed Phishing Index Login URL-90,000 (PILU-90k) was of utmost necessity in supporting our research. The dataset contains a total of 60,000 legitimate URLs (being index and login pages) and 30,000 were phishing ones.Using this dataset, a Logistic Regression model connected with TF-IDF-feature extraction was built. This model had an impressive accuracy rate in recognizing login URLs, at 98.50%.
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.