LLM Prompt Injection Detection Firewall

YAJJIPURAPU LAVANYA; PILLA RESHMA; GEDDADA JYOTHSNA ADITYA; MOTURU ABHISHEK DINESH RAJA; BANDARU MAHA LAKSHMI

doi:10.62643/ijerst.2026.v22.n2(1).pp66-72

Authors

YAJJIPURAPU LAVANYA Author
PILLA RESHMA Author
GEDDADA JYOTHSNA ADITYA Author
MOTURU ABHISHEK DINESH RAJA Author
BANDARU MAHA LAKSHMI Author

DOI:

https://doi.org/10.62643/ijerst.2026.v22.n2(1).pp66-72

Keywords:

LLM Security, Prompt Injection Detection, Jailbreak Prevention, RAG Security, Output Filtering, AI Firewall, Risk Scoring, NLP Security.

Abstract

The rapid adoption of Large Language Models (LLMs) in enterprise and consumer applications has introduced a new class of adversarial threats, including prompt injection attacks, jailbreak attempts, data exfiltration through crafted inputs, and malicious content smuggled via RetrievalAugmented Generation (RAG) pipelines. Existing approaches lack a unified, multi-layered defense mechanism capable of intercepting threats at both the input and output stages while remaining context-aware. This paper presents a novel LLM Firewall architecture that integrates three sequential detection layers—Keyword Filtering, Pattern-Based Detection, and AI-driven Semantic Analysis—with a probabilistic risk scoring engine that classifies each query as Low, Medium, or High risk. The system further incorporates a dedicated RAG Security Module that inspects uploaded documents using Optical Character Recognition (OCR) and semantic analysis before they enter the retrieval pipeline. An Output Filtering Firewall post-processes all LLM responses to suppress unsafe or policy-violating content. The system maintains session-based chat history and provides a Comparison Mode to benchmark secured versus unsecured model behavior. A web-based dashboard delivers real-time threat classification, scores, and human-readable explanations. Implemented using Flask, the Groq API, SQLite, and Tesseract OCR, the system achieves a detection accuracy of 96.4%, with an average latency overhead of 112 ms. This work establishes a comprehensive, deployable framework for securing LLM-integrated applications against a wide spectrum of adversarial inputs.

LLM Prompt Injection Detection Firewall

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

IF

Index

Latest publications

Browse

Language

Information