Maqola

Speed Up Federated Learning in Heterogeneous Environments: A Dynamic Tiering Approach

Seyed Mahmoud Sajjadi MohammadabadiDepartment of Computer Science and Engineering, University of Nevada Reno, Reno, NV, USASyed ZawadResearch Department, IBM Research-Almaden, San Jose, CA, USAFeng YanComputer Science Department and the Electrical and Computer Engineering Department, University of Houston, Houston, TX, USALei YangDepartment of Computer Science and Engineering, University of Nevada Reno, Reno, NV, USA

2024en

ABI

Annotatsiya

Federated learning (FL) enables collaborative training of a model while keeping the training data decentralized and private. However, in Internet of Things systems, inherent heterogeneity in processing power, communication bandwidth, and task size can significantly hinder the efficient training of large models. Such heterogeneity would render vast variations in the training time of clients, lengthening overall training and wasting resources of faster clients. To tackle these heterogeneity challenges, we propose dynamic tiering-based FL (DTFL), a novel system that leverages distributed optimization principles to improve the edge learning performance. Based on clients’ resources, DTFL dynamically offloads part of the global model to the server, alleviating resource constraints on slower clients and speeding up training. By leveraging split learning, DTFL offloads different portions of the global model to clients in different tiers and enables each client to update the models in parallel via local-loss-based training. This helps reduce the computation and communication demand on resource-constrained devices, mitigating the straggler problem. DTFL introduces a dynamic tier scheduler that uses tier profiling to estimate the expected training time of each client based on their historical training time, communication speed, and dataset size. The dynamic tier scheduler assigns clients to suitable tiers to minimize the overall training time in each round. We theoretically prove the convergence properties of DTFL and validate its effectiveness by training large models (ResNet-56 and ResNet-110) across varying numbers of clients (from 10 to 200) using the popular image datasets (CIFAR-10, CIFAR-100, CINIC-10, and HAM10000) under both I.I.D and non-I.I.D systems. DTFL seamlessly integrates various privacy measures without sacrificing performance. Extensive experimental results show that compared with state-of-the-art FL methods, DTFL can significantly reduce the training time by up to 80% while maintaining the model accuracy.

Hali tarjima qilinmagan

Identifikatorlar

DOI: 10.1109/jiot.2024.3487473

Iqtiboslar va manbalar

6 ta iqtibos0 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar