Статья

Prompt Injection Detection and Mitigation with AI Multiagent NLP-based Agentic Frameworks

Diego GosmarPolytechnic University of Turin, Mu Nu Chapter of IEEE-HKNDeborah A. DahlVoiceinteroperability.ai, Linux Foundation AI and DataDario GosmarPolytechnic University of Turin, Mu Nu Chapter of IEEE-HKN

2025

ABI

Аннотация

Prompt injection is a significant challenge for generative AI systems because it can lead to unintended outputs. We introduce a Multiagent NLP-based experimental framework, specifically designed to address prompt injection vulnerabilities through layered detection and metadata mechanisms. The framework orchestrates specialized AI agents to generate responses, detect vulnerabilities, and mitigate injection effects. An empirical evaluation of 500 engineered injection prompts was conducted, with ten different prompt injection categories properly generated and shuffled (50 prompts for each injection attack category). The experimental results show a significant reduction in the injection score and an increased detection of prompt injection markers, indicating potential applications for mitigation. Novel metrics—including Injection Success Rate (ISR), Policy Override Frequency (POF), Prompt Sanitization Rate (PSR), and Compliance Consistency Score (CCS)—are proposed to derive a composite Total Injection Vulnerability Score (TIVS). The system utilizes the vendor-independent OFP (Open Floor Protocol) framework for agentic AI communication via structured JSON messages. It encapsulates APIs using natural language while also comparing and extending a previously established multiagent experiment on hallucination mitigation to address the specific challenges of prompt injection.

Перевод пока недоступен

Темы

Adversarial Robustness in Machine Learning Security and Verification in Computing Information and Cyber Security

Идентификаторы

DOI: 10.1109/fllm67465.2025.11391215

Цитирования и источники

Цитирований: 0Использованных источников: 1

Показатели — AkademScholar