Article

Accelerating Matrix Multiplication with CPU Multithreading and CUDA Block-Based GPU Parallelization

Mekhriddin RakhimovDepartment of Computer Systems, Tashkent University of Information Technologies named after Muhammad Al-Khwarizmi, Tashkent, UzbekistanMannon OchilovDepartment of Robotics and Intelligent Systems, Tashkent University of Information Technologies named after Muhammad Al-Khwarizmi, Tashkent, UzbekistanRashid NasimovDepartment of artificial intelligence, Tashkent State University of Economics, Tashkent, UzbekistanShakhzod JavlievDepartment of Computer Systems, Tashkent University of Information Technologies named after Muhammad Al-Khwarizmi, Tashkent, Uzbekistan

2025

ABI

Abstract

As technology advances, we can see that the amount of data is also increasing. This article examines the problems associated with the speed of computing devices when performing arithmetic operations on large matrices. One of the optimal methods for matrix multiplication is to calculate a large matrix by dividing it into blocks using the Block-based method. This is achieved by multiplying matrices of different sizes using the Block-based parallel method on the computer's graphics processor using CUDA (Compute Unified Device Architecture) technology, as well as on the central processor using the OpenMP (Open Multi-Processing) parallel library for devices without a graphics processor. The study examines the time-consuming problem of multiplying matrices of sizes 64x64, 128x128, 512x512, 1024x1024 and 2048x2048 using these parallel processing technologies, using the simple sequential Naive method and the parallel Block-based method. The study concludes with a systematic analysis of performance metrics for several block sizes (8x8, 16x16, 32x32, etc.), an assessment of the comparative efficiency of CPU and GPU matrix multiplication implementations, and the determination of optimal limits for real-world parallel processing by comparing the efficiency of block sizes on GPUs using the OpenMP parallel programming model for CPUs and CUDA technology for NVIDIA GPUs.

Topics

Parallel Computing and Optimization Techniques Advanced Data Processing Techniques Advanced Technology in Applications

Identifiers

DOI: 10.1145/3789692.3789756

Citations and references

Cited by 029 references

Metrics — AkademScholar · Coming soon