Prompt Injection Detection and Mitigation with AI Multiagent NLP-based Agentic Frameworks
Аннотация
Prompt injection is a significant challenge for generative AI systems because it can lead to unintended outputs. We introduce a Multiagent NLP-based experimental framework, specifically designed to address prompt injection vulnerabilities through layered detection and metadata mechanisms. The framework orchestrates specialized AI agents to generate responses, detect vulnerabilities, and mitigate injection effects. An empirical evaluation of 500 engineered injection prompts was conducted, with ten different prompt injection categories properly generated and shuffled (50 prompts for each injection attack category). The experimental results show a significant reduction in the injection score and an increased detection of prompt injection markers, indicating potential applications for mitigation. Novel metrics—including Injection Success Rate (ISR), Policy Override Frequency (POF), Prompt Sanitization Rate (PSR), and Compliance Consistency Score (CCS)—are proposed to derive a composite Total Injection Vulnerability Score (TIVS). The system utilizes the vendor-independent OFP (Open Floor Protocol) framework for agentic AI communication via structured JSON messages. It encapsulates APIs using natural language while also comparing and extending a previously established multiagent experiment on hallucination mitigation to address the specific challenges of prompt injection.
Перевод пока недоступен