DEEP LEARNING APPROACH FOR AUTOMATED DEEPFAKE VIDEO FORGERY DETECTION

D. Likhitha; A. Naveen; A. Jagadeesh Reddy; B. Pavan Kumar; Mrs. D. Hima Bindu

doi:10.62643/ijerst.2026.v22.i1(2).pp112-115

Authors

D. Likhitha Author
A. Naveen Author
A. Jagadeesh Reddy Author
B. Pavan Kumar Author
Mrs. D. Hima Bindu Author

DOI:

https://doi.org/10.62643/ijerst.2026.v22.i1(2).pp112-115

Keywords:

Bidirectional LSTM; Deepfake Detection; Face Forgery; GoogLeNet; Grad-CAM; Temporal Modelling; Threshold Optimisation; Video Analysis.

Abstract

Deepfake videos, generated through face-swapping and manipulation techniques powered by Generative Adversarial Networks (GANs) and autoencoders, pose a significant threat to digital media integrity, online trust, and democratic discourse. This paper presents an end-to-end deepfake video detection system that combines GoogLeNetbased spatial feature extraction with a Bidirectional Long Short-Term Memory (BiLSTM) classifier for temporal sequence modelling. Each video is uniformly sampled into frames; faces are localised using the Viola-Jones Haar cascade detector; and per-frame 1024-dimensional feature vectors are extracted from the final pooling layer of a pretrained GoogLeNet. The resulting feature sequences are fed to a BiLSTM network trained with cross-entropy loss, Adam optimisation, and gradient clipping. A data-driven threshold tuning procedure sweeps the fake-class probability threshold from 0.35 to 0.60 to maximise the F1-score, yielding an optimal threshold of 0.35, which achieves 88.5% accuracy, 98.7% recall, and an F1-score of 89.56%. Grad-CAM heatmaps computed on the inception5b layer of GoogLeNet provide spatial explainability by highlighting manipulated facial regions. A Streamlit web application integrates all pipeline stages, offering real-time prediction, confidence calibration, and frame-level Grad-CAM overlays. Experiments conducted on the Celeb-DF benchmark demonstrate the effectiveness of the proposed approach for high-recall deepfake detection.