Skip to content
Snippets Groups Projects
Commit 4516460a authored by Erick Lavoie's avatar Erick Lavoie
Browse files

Further improved abstract

parent 45f64665
No related branches found
No related tags found
No related merge requests found
...@@ -53,7 +53,7 @@ EPFL, Lausanne, Switzerland \\ ...@@ -53,7 +53,7 @@ EPFL, Lausanne, Switzerland \\
\maketitle % typeset the header of the contribution \maketitle % typeset the header of the contribution
% %
\begin{abstract} \begin{abstract}
The convergence speed of machine learning models trained with Federated Learning is significantly affected by non-identically and independently distributed (non-IID) data partitions, even more so in a fully decentralized (serverless) setting. We propose the D-Cliques topology, which reduces gradient bias by grouping nodes in cliques such that their local joint distribution is representative of the global distribution. D-Cliques provide similar convergence speed as a fully-connected topology, both in IID and non-IID settings, with a significant reduction in the number of required edges and messages: at a scale of 1000 nodes, 98\% less edges and 96\% less total messages. We show how D-Cliques can be used to successfully implement momentum, critical to quickly train deep convolutional networks but otherwise detrimental in a non-IID setting. We finally show that, among many possible inter-clique topologies, a small-world topology that scales the number of edges logarithmically in the number of nodes provides a further 22\% reduction in the number of edges compared to fully connecting cliques with a single edge pairwise at 1000 nodes, and suggests bigger possible gains at larger scales. The convergence speed of machine learning models trained with Federated Learning is significantly affected by non-identically and independently distributed (non-IID) data partitions, even more so in a fully decentralized (serverless) setting. We propose the D-Cliques topology, which reduces gradient bias by grouping nodes in cliques such that their local joint distribution is representative of the global distribution. D-Cliques provide similar convergence speed as a fully-connected topology on MNIST and CIFAR10 with a significant reduction in the number of required edges and messages: at a scale of 1000 nodes, 98\% less edges and 96\% less total messages. We show how D-Cliques can be used to successfully implement momentum, critical to quickly train deep convolutional networks but otherwise detrimental in a non-IID setting. We finally show that, among many possible inter-clique topologies, a small-world topology that scales the number of edges logarithmically in the number of nodes converges almost as quickly as fully connecting cliques with a single edge pairwise. A smallworld topology thus provides a further 22\% reduction in the number of edges at 1000 nodes (14.6 vs 18.9 edges on average per node), which suggests bigger possible gains at larger scales.
\keywords{Decentralized Learning \and Federated Learning \and Topology \and \keywords{Decentralized Learning \and Federated Learning \and Topology \and
Non-IID Data \and Stochastic Gradient Descent} Non-IID Data \and Stochastic Gradient Descent}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment