An Elegant Multi-Agent Gradient Descent for Effective Optimization in Neural Network Training and Beyond
Annotatsiya
Non-convex optimization problems often challenge gradient-based algorithms, such as Gradient Descent. Neural network training, a prominent application of gradient-based methods, heavily relies on their computational efficiency. However, the cost function in neural network training is typically non-convex, causing gradient-based algorithms to become trapped in local minima due to their limited exploration of the solution space. In contrast, global optimization algorithms, such as swarm-based methods, provide better exploration but introduce significant computational overhead. To address these challenges, we propose Multi-Agent Gradient Descent (MAGD), a novel algorithm that combines the efficiency of gradient-based methods with enhanced exploration capabilities. MAGD initializes multiple agents, each representing a candidate solution, and independently updates their positions using gradient-based techniques without inter-agent communication. The number of agents is dynamically adjusted by removing underperforming agents to minimize computational cost. MAGD offers a cost-effective solution for non-convex optimization problems, including but not limited to neural network training. We benchmark MAGD against traditional Gradient Descent (GD), Adam, and Swarm-Based Gradient Descent (SBGD), demonstrating that MAGD achieves superior solution quality without a significant increase in computational complexity. MAGD outperforms these methods on 20 benchmark mathematical optimization functions and 20 real-world classification and regression datasets for training shallow neural networks.