A Unified Multimodal Deep Learning System for Robust Deep-fake Analysis

Authors

  • B.V.Poorna Sree Author
  • D.Yashwanth Author
  • G.Raja Author
  • A.Hemanth Kumar Author
  • Prof.(Dr.)Ravi Kiran Author

DOI:

https://doi.org/10.62643/ijerst.2026.v22.n1(2).pp219-223

Keywords:

Deepfake Detection; Convolutional Neural Networks; GoogLeNet; Log-Mel Spectrogram; LSTM; Grad-CAM; Multimodal Fusion; Streamlit

Abstract

Deepfake media generated by generative adversarial networks (GANs) and neural voice synthesis pose an escalating threat to information integrity. This paper presents a multimodal deepfake detection system that jointly analyses video frames, facial images, and audio waveforms using three specialized deep learning branches: a GoogLeNet-based visual feature extractor coupled with an LSTM for temporal video modelling, a GoogLeNet image classifier with Grad-CAM explainability, and a four-layer convolutional neural network (CNN) trained on log-mel spectrograms for audio deepfake detection. A weighted fusion layer combines per-modality softmax scores into a single fakeness probability with a configurable decision threshold. The audio CNN achieves 94.2% test accuracy on binary real/fake classification; multimodal fusion further improves detection to 96.3% accuracy with F1- score 0.963. An interactive Streamlit application delivers realtime predictions with Grad-CAM attention overlays for analyst interpretability

Downloads

Published

28-03-2026

How to Cite

A Unified Multimodal Deep Learning System for Robust Deep-fake Analysis . (2026). International Journal of Engineering Research and Science & Technology, 22(1(2), 219-223. https://doi.org/10.62643/ijerst.2026.v22.n1(2).pp219-223