sec 6, finished pass!

7c43d00e · aurelien.bellet · 3cdc56d1 · 7c43d00e
Commit 7c43d00e authored 4 years ago by aurelien.bellet
--- a/main.tex
+++ b/main.tex
@@ -931,8 +931,35 @@ non-IID data.
 \section{Conclusion}
 \label{section:conclusion}

-We have proposed D-Cliques, a sparse topology that recovers the convergence
-speed and non-IID compensating behaviour of a fully-connected topology in the presence of local class bias. D-Cliques are based on assembling cliques of diverse nodes such that their joint local distribution is representative of the global distribution, essentially locally recovering IID-ness. Cliques are joined in a sparse inter-clique topology such that they quickly converge to the same model. Within cliques, Clique Averaging can be used to remove the non-IID bias in gradient computation by averaging gradients only with other nodes of clique. Clique Averaging can in turn be used to implement unbiased momentum to recover the convergence speed usually only possible with IID mini-batches. We have shown the clustering of D-Cliques and full connectivity within cliques to be critical in obtaining these results. Finally, we have evaluated different inter-clique topologies with 1000 nodes. While they all provide significant reduction in the number of edges compared to fully connecting all nodes, a small-world approach that scales in $O(n + log(n))$ in the number of nodes seems to be the most advantageous compromise between scalability and convergence speed. The D-Clique topology approach therefore seems promising to reduce bandwidth usage on FL servers and to implement fully decentralized alternatives in a wider range of applications. For instance, the presence and relative frequency of global classes could be computed using PushSum~\cite{kempe2003gossip}, and neighbors could be selected with PeerSampling~\cite{jelasity2007gossip}. This is part of our future work.
+We proposed D-Cliques, a sparse topology that recovers the convergence
+speed of a fully-connected topology in the presence of local class bias.
+D-Cliques is based on assembling cliques of nodes such that their joint local
+distribution is representative of the global distribution so as to locally
+recover IIDness. Cliques are joined in a sparse inter-clique topology so that
+they quickly converge to the same model. We proposed Clique
+Averaging to remove the non-IID bias in gradient computation by
+averaging gradients only with other nodes within the clique. Clique Averaging
+can in turn be used to implement unbiased momentum to recover the convergence
+speed usually only possible with IID mini-batches. Through our experiments, we
+showed that the clique structure of D-Cliques is critical in obtaining these
+results and that a small-world inter-clique topology with $O(n
+ log (n))$ edges seems to achieve the best compromise between
+convergence speed and scalability with the number of nodes.
+
+D-Cliques thus appears to be promising to reduce bandwidth
+usage on FL servers and to implement fully decentralized alternatives in a
+wider range of applications where global coordination is impossible or costly.
+For instance, the presence and relative frequency of classes in each node
+could be computed using PushSum~\cite{kempe2003gossip}, and the topology could
+be constructed in a decentralized and adaptive way with
+PeerSampling~\cite{jelasity2007gossip}. This will be investigated in future work.
+We also believe that our ideas can be useful to deal
+with more general types of data non-IIDness beyond the important case of
+local class bias that we studied in this paper. An important example is
+covariate shift or feature distribution skew \cite{kairouz2019advances}, where
+local density estimates could be used as basis to construct cliques that
+approximately
+recover the global distribution.

 %\section{Future Work}
 %\begin{itemize}