Hierarchical federated learning (HFL) is a distributed machine learning approach that organizes devices or nodes into a hierarchy to improve the efficiency and effectiveness of the training process. In this setup, data remains on individual devices, which participate in training a global model by sharing only model updates rather than raw data. This approach is particularly useful for scenarios where data is distributed across multiple sources, such as smartphones, medical devices, or IoT sensors, and privacy concerns prohibit the centralized collection of data.
In HFL, the hierarchy typically consists of different levels, such as local clients, local aggregators, and a central server. At the local level, individual devices perform the initial training on their local data and generate model updates. These updates are then sent to local aggregators, which consolidate the information from multiple devices. This localized aggregation helps reduce the communication overhead and speed up the training process. Finally, the central server collects updates from various local aggregators, further merges them to refine the global model, and then redistributes the updated model back to the local aggregators and devices.
For example, consider a health app running on smartphones that collects data about users' daily activities. Instead of sending sensitive health data to a central server, the app can conduct local model training on each device. Users' models are then sent to local aggregators based on geographical regions, which pool these updates before sending them to a central server for the final aggregation. This hierarchical approach not only enhances the training speed but also ensures user privacy, making it particularly appealing for applications that prioritize data sensitivity.