HYBRID CNN-DRIVEN RANDOM FOREST AND FASTTEXT  EMBEDDINGS FOR UNMASKING AI-GENERATED TWEETS

K. Manohar Rao; Manaswi Ramidi; Nikitha Manga; Mythili Koppula

doi:10.62643/ijerst.v21.n3(1).pp528-535

Authors

K. Manohar Rao Author
Manaswi Ramidi Author
Nikitha Manga Author
Mythili Koppula Author

DOI:

https://doi.org/10.62643/ijerst.v21.n3(1).pp528-535

Keywords:

Deepfake Tweet Detection, AI-generated Tweets, Social Media Misinformation, Convolutional Neural Network, Random Forest Classifier.

Abstract

This study introduces a robust framework for detecting fake tweets using a dedicated Twitter fake tweet
dataset, combining natural language processing (NLP) and machine learning to improve classification
accuracy. Traditional manual detection methods are limited by scalability issues, subjectivity, and the
inability to effectively identify subtle linguistic or contextual signals in vast volumes of social media data.
To overcome these limitations, the proposed approach employs a multi-stage pipeline. It begins with
comprehensive NLP preprocessing to clean and normalize the tweet content, followed by the application
of FastText embeddings to convert textual information into meaningful numerical vectors. The data is
then partitioned into training and testing sets using a train-test split strategy to ensure reliable evaluation.
A deep learning convolutional neural network (DLCNN) is used for sophisticated feature extraction,
uncovering complex patterns within the text. These features are subsequently classified using a random
forest algorithm, which determines whether tweets are real or fake. The model's performance is
thoroughly evaluated using key metrics to validate its accuracy and applicability in real-world
misinformation detection scenarios.