MACHINE LEARNING BASED APPROACH FOR IDENTIFYING FAKE ONLINE REVIEWS
DOI:
https://doi.org/10.62643/Abstract
Online reviews have become one of the most influential factors in consumer purchase decisions, with studies indicating that over 90% of consumers read reviews before making purchases and 84% trust online reviews as much as personal recommendations. However, the economic incentives for positive reviews have led to a proliferation of fake reviews, with estimates suggesting that 15-30% of reviews on major ecommerce platforms are fraudulent. These deceptive reviews, posted for personal gain or competitive sabotage, mislead consumers and distort market dynamics. This paper presents a machine learning-based approach to identify fake online reviews using both supervised and semi-supervised learning techniques to handle scenarios with limited labeled data. Reviews are preprocessed using comprehensive text processing methods including tokenization, stopword removal, and lemmatization. Important features including TFIDF word frequency vectors, review length statistics, sentiment polarity scores, rating deviation from product average, and reviewer behavioral patterns are extracted to create discriminative feature representations. Multiple classification models including Random Forest, Support Vector Machine, and Naive Bayes are trained and compared on a labeled review dataset of 5,000 reviews. Semi-supervised label propagation is applied to leverage unlabeled reviews when labeled data is scarce, improving accuracy by 4.2% with only 40% labeled data. Experimental results demonstrate that Random Forest achieves the best performance with 91.8% accuracy, 89.2% precision, and 93.1% recall for fake review identification, significantly outperforming baseline approaches and helping maintain trust in online review platforms.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.













