Scheduling and queueing algorithms for resource-sharing in federated learning

PI: Gauri Joshi, Assistant Professor, Electrical and Computer Engineering, College of Engineering

Co-PI: Weina Wang, Assistant Professor, School of Computer Science

Federated learning is an emerging framework that enables machine learning models to be trained using thousands of edge devices such as cell phones and IoT sensors. Federated learning offers better privacy guarantees than data-center-based training since it keeps the users’ data on the device and only shares the model with the aggregating cloud server. Currently, only a few applications such as Google’s keyword prediction service employ federated learning. Going forward, many more applications are expected to migrate model training to edge devices, and each application will seek to independently train its own model. These applications will then contend for the limited on-device computation and communication capabilities at the edge clients.

The goal of this project is to design resource-sharing protocols that will ensure simultaneous, yet fast and accurate training of multiple models. We will bring together complementary tools from the classical field of scheduling and queueing theory and from online learning methods to: 1) design data-driven algorithms for the central server to select the best subset of edge clients in each training round based on estimates of their workload and local loss, and 2) design protocols for edge clients to prioritize training tasks coming from different applications. Approaches from scheduling and queueing theory can reduce the dimensionality of the decision space through model-based insights. However, the distributed nature of federated learning introduces more uncertainty and volatility to the system, posing unique challenges that are previously unexplored in queueing theory and entailing learning and data-based methods.