Scalytics | Federated Learning: Overcoming Challenges in Secure AI Model Training

Scalytics

A guest blog from Dr. Minghong Fang, Department of Computer Science and Engineering at the University of Louisville.

Federated learning is a distributed approach to training machine learning models where data remains localized across different clients, such as Internet of Things (IoT) devices, smartphones, or larger institutions like hospitals. A defining characteristic of federated learning is that clients' training data are often drawn from diverse distributions. For instance, Client A's data might predominantly represent one class, while Client B's data pertains to another. This paradigm enables clients to collaboratively train a global machine learning model under the coordination of a central server without sharing their raw data.

In contrast to centralized learning, where a server aggregates all clients' data into a single dataset for model training, federated learning significantly enhances privacy by eliminating the need to share private datasets. Additionally, it reduces communication costs by transmitting models rather than raw data. These advantages make federated learning particularly suited for scenarios where vast amounts of distributed data are critical, such as in AI-driven applications. The method has been widely adopted in industry, with notable examples including Scalytics, a platform that integrates data federation capabilities in an enterprise’s data infrastructure. This includes data access management for digital assistants, information provisioning for multi-agent systems, and federated learning to ensure compliance with evolving data regulations.

‍

Limitations of the synchronous training design

Most federated learning frameworks operate on a synchronous design. In this setup, during each training round, all clients start with the same global model, which they use to fine-tune their local models. However, this synchronous approach presents notable challenges. Firstly, clients often have varying computational capacities, meaning clients with higher resources can complete their local training faster than those with fewer resources. This creates the straggler problem, where the central server must wait for slower clients to finish before proceeding. Secondly, clients may drop out or face connectivity issues, further disrupting the synchronous training process. To address these limitations, asynchronous federated learning has been proposed. In this design, the server updates the global model immediately upon receiving a local model from any client, thereby accommodating variations in client availability and computational speed.

‍

Vulnerability against poisoning

Figure 1: Federated learning systems under attack.

‍

Despite these advancements, both synchronous and asynchronous federated learning are inherently vulnerable to poisoning attacks due to their decentralized nature. In such attacks, adversaries control malicious clients that send harmful local models to the server, potentially corrupting the global model to misclassify data deliberately. For example, as illustrated in Figure 1, the attacker controls Client 1, who purposely transmits harmful local models to the server to distort the global model. Poisoning attacks are broadly categorized into data poisoning and model poisoning. Data poisoning involves manipulating the training data on malicious clients, either intentionally or inadvertently. For instance, if a client sources its training data from unreliable origins like the internet, the data might already be poisoned or mislabeled. On the other hand, model poisoning attacks directly alter the local models before sending them to the server.

‍

The Byzantine-robust aggregation rules

To counteract these threats, numerous Byzantine-robust aggregation rules [1][2] have been developed, specifically designed to mitigate the effects of poisoning attacks. For example, the Krum [1] aggregation rule selects the local model with the smallest cumulative distance to a subset of its nearest neighbors, assuming that benign models are more similar to each other than to malicious ones. Here, "benign models" are those submitted by trustworthy or honest clients who adhere to the intended objectives of the federated learning process, without attempting to attack the system. Although this method can be effective, it is not entirely foolproof.

‍

Proposed improvements

Dr. Minghong Fang, a tenure-track assistant professor at the University of Louisville, has demonstrated that existing defenses against poisoning attacks in federated learning are often insufficient [3]. His research reveals that advanced attacks can bypass even robust aggregation rules. For instance, an attacker can meticulously craft malicious local models such that the server consistently selects these models as the final output, even when employing the Krum rule. These findings highlight fundamental vulnerabilities in the current federated learning frameworks, urging the need for stronger defenses.

Dr. Fang's research not only uncovers critical vulnerabilities in federated learning but also introduces innovative defenses to mitigate poisoning attacks. One proposed solution [3] focuses on identifying and excluding local models that degrade the global model's performance, such as those contributing to a noticeable increase in error rates or loss. For instance, using Scalytics services, a company can gather small validation datasets through manual labeling by either enlisting its employees or using available label services to label reference data. Subsequently, the impact of each local model on the error rate of those validation datasets can be calculated. Local models that result in high error rates can then be identified and excluded from further consideration. Other approaches [4][5] also leverage a small, clean, and trusted dataset available to the server. These approaches enable the computation of a reference model against which the proximity of each client's local model can be assessed. Models that closely align with the reference are considered benign, while significant deviations are flagged as potentially malicious. However, in scenarios where the server cannot access a trusted dataset, an alternative strategy [6] involves identifying malicious clients by analyzing their model update reconstruction errors, offering a practical method for detecting adversarial behavior without requiring additional data.

Federated learning marks a transformative approach in machine learning, fostering collaborative model training while prioritizing data privacy. However, its susceptibility to poisoning attacks highlights the urgent need for effective defense strategies. Dr. Fang's work sheds light on these vulnerabilities and introduces actionable measures to strengthen the security and dependability of federated learning frameworks. As its application expands across industries, tackling these security challenges will be essential to unlocking the technology's full capabilities.

‍

[1] Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Neural Information Processing Systems (NeurIPS) 2017.

[2] Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In International Conference on Machine Learning (ICML) 2018.

[3] Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. In USENIX Security Symposium 2020.

[4] FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In Network and Distributed System Security (NDSS) 2021.

[5] AFLGuard: Byzantine-robust Asynchronous Federated Learning. In Annual Computer Security Applications Conference (ACSAC) 2022.

[6] FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Reconstruction Error. In International Conference on Machine Learning (ICML) 2024.

‍

About Scalytics

Scalytics provides enterprise-grade infrastructure that enables deployment of compute-intensive workloads in any environment—cloud, on-premise, or dedicated data centers. Our platform, Scalytics Connect, delivers a robust, vendor-agnostic solution for running high-performance computational models while maintaining complete control over your infrastructure and intellectual assets.
Built on distributed computing principles and modern virtualization, Scalytics Connect orchestrates resource allocation across heterogeneous hardware configurations, optimizing for throughput and latency. Our platform integrates seamlessly with existing enterprise systems while enforcing strict isolation boundaries, ensuring your proprietary algorithms and data remain entirely within your security perimeter.
‍
With features like autodiscovery and index-based search, Scalytics Connect delivers a forward-looking, transparent framework that supports rapid product iteration, robust scaling, and explainable AI. By combining agents, data flows, and business needs, Scalytics helps organizations overcome traditional limitations and fully take advantage of modern AI opportunities.

If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.

The Challenges and Vulnerabilities of Federated Learning: Insights and Solutions

Limitations of the synchronous training design

Vulnerability against poisoning

The Byzantine-robust aggregation rules

Proposed improvements

About Scalytics

Ready for Enterprise Artificial Intelligence?

Launch your data + AI transformation.