Статья

On Exploring Attention-based Explanation for Transformer Models in Text Classification

Shengzhong LiuDepartment of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USAFranck LeIBM Research, Yorktown Heights, USASupriyo ChakrabortyIBM Research, Yorktown Heights, USATarek AbdelzaherDepartment of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA

2021en

ABI

Аннотация

The Transformer models have achieved unprecedented breakthroughs in text classification, and have become the foundation of most state-of-the-art NLP systems. The core function that drives the success is the attention mechanism, which provides the ability to dynamically focus on different parts of the input sequence when producing the predictions. Several previous works have investigated the usage of attention weights to explain the model predictions, because intuitively, attention weights reflect the importance of the input positions in the output. Specifically, the objective for explanation is to compute a relevance score for each input token, such that the key input words that are most important to the prediction can be identified. However, previous efforts produced mixed results. We find that the key reason why attention weights cannot be directly used as effective relevance indications is because they do not contain the directional information for relevance (i.e., whether the input tokens contribute towards or against the prediction). We then propose two novel explanation techniques, namely AGrad and RePAGrad, that produce directional relevance scores based on attention weights. To evaluate the explanation performance, we propose three properties that an effective explanation method should satisfy (i.e., faithfulness, resilience, and consistency), and design the corresponding test to quantify each property. Through extensive evaluations with Transformer models and pre-trained BERT models on multiple public text classification datasets, we show that AGrad and RePAGrad significantly outperform existing state-of-the-art explanation methods in faithfulness and consistency, at the cost of nominal degradation on resilience compared to attention weights. In addition, we reveal that elements of a model architecture can play an important role towards explainability.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/bigdata52589.2021.9671639

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar