LLM Prompt Injection Detection Firewall

Authors

  • YAJJIPURAPU LAVANYA Author
  • PILLA RESHMA Author
  • GEDDADA JYOTHSNA ADITYA Author
  • MOTURU ABHISHEK DINESH RAJA Author
  • BANDARU MAHA LAKSHMI Author

DOI:

https://doi.org/10.62643/ijerst.2026.v22.n2(1).pp66-72

Keywords:

LLM Security, Prompt Injection Detection, Jailbreak Prevention, RAG Security, Output Filtering, AI Firewall, Risk Scoring, NLP Security.

Abstract

The rapid adoption of Large Language Models (LLMs) in enterprise and consumer applications has introduced a new class of adversarial threats, including prompt injection attacks, jailbreak attempts, data exfiltration through crafted inputs, and malicious content smuggled via RetrievalAugmented Generation (RAG) pipelines. Existing approaches lack a unified, multi-layered defense mechanism capable of intercepting threats at both the input and output stages while remaining context-aware. This paper presents a novel LLM Firewall architecture that integrates three sequential detection layers—Keyword Filtering, Pattern-Based Detection, and AI-driven Semantic Analysis—with a probabilistic risk scoring engine that classifies each query as Low, Medium, or High risk. The system further incorporates a dedicated RAG Security Module that inspects uploaded documents using Optical Character Recognition (OCR) and semantic analysis before they enter the retrieval pipeline. An Output Filtering Firewall post-processes all LLM responses to suppress unsafe or policy-violating content. The system maintains session-based chat history and provides a Comparison Mode to benchmark secured versus unsecured model behavior. A web-based dashboard delivers real-time threat classification, scores, and human-readable explanations. Implemented using Flask, the Groq API, SQLite, and Tesseract OCR, the system achieves a detection accuracy of 96.4%, with an average latency overhead of 112 ms. This work establishes a comprehensive, deployable framework for securing LLM-integrated applications against a wide spectrum of adversarial inputs.

Downloads

Published

06-04-2026

How to Cite

LLM Prompt Injection Detection Firewall. (2026). International Journal of Engineering Research and Science & Technology, 22(2(1), 66-72. https://doi.org/10.62643/ijerst.2026.v22.n2(1).pp66-72