%\title{D-Cliques: Topology can compensate NonIIDness in Decentralized Federated Learning}
\title{D-Cliques: Compensating NonIIDness in Decentralized Federated Learning with topology}
\title{D-Cliques: Compensating NonIIDness in Decentralized Federated Learning
with Topology}
%
\titlerunning{D-Cliques}
% If the paper title is too long for the running head, you can set
...
...
@@ -50,7 +51,21 @@
\maketitle% typeset the header of the contribution
%
\begin{abstract}
The convergence speed of machine learning models trained with Federated Learning is significantly affected by non-independent and identically distributed (non-IID) data partitions, even more so in a fully decentralized setting without a central server. In this paper, we show that the impact \textit{local class bias} can be significantly reduced by carefully designing the underlying communication topology. We present D-Cliques, a novel topology that reduces gradient bias by grouping nodes in cliques such that their local joint distribution is representative of the global class distribution. We refine D-Cliques with Clique Averaging and unbiased momentum, tested on MNIST and CIFAR10, and demonstrate that D-Cliques provide similar convergence speed as a fully-connected topology with a significant reduction in the number of required edges and messages. In a 1000-node topology, D-Cliques requires 98\% less edges and 96\% less total messages to achieve a similar accuracy, with further possible gains using a small-world topology.
The convergence speed of machine learning models trained with Federated
Learning is significantly affected by non-independent and identically
distributed (non-IID) data partitions, even more so in a fully decentralized
setting without a central server. In this paper, we show that the impact
\textit{local class bias} can be significantly reduced by carefully designing
the underlying communication topology. We present D-Cliques, a novel topology
that reduces gradient bias by grouping nodes in interconnected cliques such
that the local joint distribution in a clique is representative of the global
class distribution. We also show how to adapt the updates of decentralized SGD
to obtain unbiased gradients and effective momentum with D-Cliques. Our
empirical evaluation on MNIST and CIFAR10 demonstrates that our approach
provides similar convergence speed as a fully-connected topology with a
significant reduction in the number of edges and messages. In a 1000-node
topology, D-Cliques requires 98\% less edges and 96\% less total messages,
with further possible gains using a small-world topology across cliques.