Improving the Computing Accuracy of the AI Ascend Processor: Research and Results
Annotatsiya
The recent US sanctions against Huawei have severely restricted access to advanced semiconductor manufacturing technologies, prompting the development of the Ascend 910B accelerator on a 7-nm SMIC processor. Specifically engineered for artificial intelligence tasks, the Ascend 910B delivers performance that rivals NVIDIA's A100, although it exhibits limitations in computing accuracy, particularly with FP16 and INT8 operations. In this study, we propose a novel algorithm aimed at enhancing the computational precision of matrix multiplication on the Ascend AI processor, allowing it to achieve results that closely approach FP32 accuracy. Our approach capitalizes on the Da Vinci architecture's capability to split FP32 operations into two FP16 operations, which effectively reduces the relative error by approximately 20% in matrix multiplication tasks. To validate the effectiveness of this algorithm, we conducted comparative experiments against standard FP32 operations on both the Ascend 910B and a host device. The findings reveal that our technique significantly enhances the performance of the Ascend processor for AI workloads demanding high precision. This advancement not only improves computational accuracy but also reinforces the potential of the Ascend 910B as a competitive alternative in the rapidly evolving landscape of AI accelerators.