Skip to main content
Article

Improving the Computing Accuracy of the AI Ascend Processor: Research and Results

Firnafas YusupovUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of Software Engineering,Urgench,UzbekistanMukhiddin F. IbragimovUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of Software Engineering,Urgench,UzbekistanSaidbek P. BabayazovUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of Software Engineering,Urgench,UzbekistanKumushoy E. NiyazmetovaUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of Software Engineering,Urgench,UzbekistanXaitbayeva Z. DurdonaUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of Software Engineering,Urgench,UzbekistanUmidbek P. BabayazovUrgench State University,Department of Socio-economic Sciences,Urgench,Uzbekistan
2024en
ABI

Abstract

The recent US sanctions against Huawei have severely restricted access to advanced semiconductor manufacturing technologies, prompting the development of the Ascend 910B accelerator on a 7-nm SMIC processor. Specifically engineered for artificial intelligence tasks, the Ascend 910B delivers performance that rivals NVIDIA's A100, although it exhibits limitations in computing accuracy, particularly with FP16 and INT8 operations. In this study, we propose a novel algorithm aimed at enhancing the computational precision of matrix multiplication on the Ascend AI processor, allowing it to achieve results that closely approach FP32 accuracy. Our approach capitalizes on the Da Vinci architecture's capability to split FP32 operations into two FP16 operations, which effectively reduces the relative error by approximately 20% in matrix multiplication tasks. To validate the effectiveness of this algorithm, we conducted comparative experiments against standard FP32 operations on both the Ascend 910B and a host device. The findings reveal that our technique significantly enhances the performance of the Ascend processor for AI workloads demanding high precision. This advancement not only improves computational accuracy but also reinforces the potential of the Ascend 910B as a competitive alternative in the rapidly evolving landscape of AI accelerators.

Topics

Identifiers

Citations and references

Metrics — AkademScholar · Coming soon