From 4516460a7af2b9133d10e6fbc3325c16e1658316 Mon Sep 17 00:00:00 2001
From: Erick Lavoie <erick.lavoie@epfl.ch>
Date: Wed, 24 Mar 2021 19:07:53 +0100
Subject: [PATCH] Further improved abstract

---
 main.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/main.tex b/main.tex
index 3cec12d..198fbf7 100644
--- a/main.tex
+++ b/main.tex
@@ -53,7 +53,7 @@ EPFL, Lausanne, Switzerland \\
 \maketitle              % typeset the header of the contribution
 %
 \begin{abstract}
-The convergence speed of machine learning models trained with Federated Learning is significantly affected by non-identically and independently distributed (non-IID) data partitions, even more so in a fully decentralized (serverless) setting. We propose the D-Cliques topology, which reduces gradient bias by grouping nodes in cliques such that their local joint distribution is representative of the global distribution. D-Cliques provide similar convergence speed as a fully-connected topology, both in IID and non-IID settings, with a significant reduction in the number of required edges and messages: at a scale of 1000 nodes, 98\% less edges and 96\% less total messages. We show how D-Cliques can be used to successfully implement momentum, critical to quickly train deep convolutional networks but otherwise detrimental in a non-IID setting. We finally show that, among many possible inter-clique topologies, a small-world topology that scales the number of edges logarithmically in the number of nodes provides a further 22\% reduction in the number of edges compared to fully connecting cliques with a single edge pairwise at 1000 nodes, and suggests bigger possible gains at larger scales.
+The convergence speed of machine learning models trained with Federated Learning is significantly affected by non-identically and independently distributed (non-IID) data partitions, even more so in a fully decentralized (serverless) setting. We propose the D-Cliques topology, which reduces gradient bias by grouping nodes in cliques such that their local joint distribution is representative of the global distribution. D-Cliques provide similar convergence speed as a fully-connected topology on MNIST and CIFAR10 with a significant reduction in the number of required edges and messages: at a scale of 1000 nodes, 98\% less edges and 96\% less total messages. We show how D-Cliques can be used to successfully implement momentum, critical to quickly train deep convolutional networks but otherwise detrimental in a non-IID setting. We finally show that, among many possible inter-clique topologies, a small-world topology that scales the number of edges logarithmically in the number of nodes converges almost as quickly as fully connecting cliques with a single edge pairwise. A smallworld topology thus provides a further 22\% reduction in the number of edges at 1000 nodes (14.6 vs 18.9 edges on average per node), which suggests bigger possible gains at larger scales.
 
 \keywords{Decentralized Learning \and Federated Learning \and Topology \and
 Non-IID Data \and Stochastic Gradient Descent}
-- 
GitLab