Explainable and Transparent AI Architectures
Annotatsiya
Explainability and transparency have emerged as foundational pillars in the secure deployment of artificial intelligence (AI) systems, especially large language models (LLMs). This chapter examines the evolving landscape of explainable AI (XAI) architectures through the lens of cybersecurity, adversarial robustness, and regulatory compliance. The authors survey core XAI methodologies—including LIME, SHAP, mechanistic interpretability, attention attribution, and causal tracing—evaluating their effectiveness against adversarial threats such as jailbreaking, prompt injection, data poisoning, and hallucination exploitation. The dual nature of XAI is critically examined: while transparency mechanisms bolster defense and trust, they simultaneously introduce novel attack surfaces that adversaries can exploit to subvert explanation systems.
Hali tarjima qilinmagan