ENHANCED MALICIOUS WEBSITE DETECTION THROUGH MULTI-MODAL FEATURES AND CONVOLUTIONAL NEURAL NETWORKS
DOI:
https://doi.org/10.62643/ijerst.2025.v21.i2.pp420-431Abstract
Web apps are now widely used in many different business domains and are vital tools for billions of people in their everyday lives. Unfortunately, a lot of these programs are harmful, which poses a serious risk to Internet users as they may spread spam, install malware, and steal private data. Because of the intricacy of extracting representative features, the vast amount of data, the dynamic nature of dangerous patterns, the stealthiness of assaults, and the limits of conventional classifiers, detecting malicious websites by online content analysis is useless. Static Uniform Resource Locators (URL) characteristics may often provide you instant information about a website without requiring you to load its content.
However, complicated feature extraction, large data volumes, changing attack patterns, and the limits of conventional classifiers sometimes make it difficult for current systems to identify malicious web apps using online content analysis. It is inadequate to rely just on lexical URL properties, which might result in incorrect classifications. In order to improve the efficacy of dangerous website identification, this research suggests a multimodal representation strategy that combines textual and image-based information. While picture features are useful for identifying more generic harmful patterns, textual features help the deep learning model comprehend and convey specific semantic information pertaining to attack patterns. By doing this, it may be possible to identify patterns in picture format that are obscured in text format. To extract the hidden features from both textual and image-represented information, two Convolutional Neural Network (CNN) models were built. Both models' output layers were merged and fed into a classifier that uses artificial neural networks to make decisions. The usefulness of the suggested model in comparison to other models is shown by the results. While the false positive rate decreased by 1.5%, the overall performance as measured by the Matthews Correlation Coefficient (MCC) increased by 4.3%.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.