Skip to content
Snippets Groups Projects
Commit 7c43d00e authored by aurelien.bellet's avatar aurelien.bellet
Browse files

sec 6, finished pass!

parent 3cdc56d1
No related branches found
No related tags found
No related merge requests found
......@@ -931,8 +931,35 @@ non-IID data.
\section{Conclusion}
\label{section:conclusion}
We have proposed D-Cliques, a sparse topology that recovers the convergence
speed and non-IID compensating behaviour of a fully-connected topology in the presence of local class bias. D-Cliques are based on assembling cliques of diverse nodes such that their joint local distribution is representative of the global distribution, essentially locally recovering IID-ness. Cliques are joined in a sparse inter-clique topology such that they quickly converge to the same model. Within cliques, Clique Averaging can be used to remove the non-IID bias in gradient computation by averaging gradients only with other nodes of clique. Clique Averaging can in turn be used to implement unbiased momentum to recover the convergence speed usually only possible with IID mini-batches. We have shown the clustering of D-Cliques and full connectivity within cliques to be critical in obtaining these results. Finally, we have evaluated different inter-clique topologies with 1000 nodes. While they all provide significant reduction in the number of edges compared to fully connecting all nodes, a small-world approach that scales in $O(n + log(n))$ in the number of nodes seems to be the most advantageous compromise between scalability and convergence speed. The D-Clique topology approach therefore seems promising to reduce bandwidth usage on FL servers and to implement fully decentralized alternatives in a wider range of applications. For instance, the presence and relative frequency of global classes could be computed using PushSum~\cite{kempe2003gossip}, and neighbors could be selected with PeerSampling~\cite{jelasity2007gossip}. This is part of our future work.
We proposed D-Cliques, a sparse topology that recovers the convergence
speed of a fully-connected topology in the presence of local class bias.
D-Cliques is based on assembling cliques of nodes such that their joint local
distribution is representative of the global distribution so as to locally
recover IIDness. Cliques are joined in a sparse inter-clique topology so that
they quickly converge to the same model. We proposed Clique
Averaging to remove the non-IID bias in gradient computation by
averaging gradients only with other nodes within the clique. Clique Averaging
can in turn be used to implement unbiased momentum to recover the convergence
speed usually only possible with IID mini-batches. Through our experiments, we
showed that the clique structure of D-Cliques is critical in obtaining these
results and that a small-world inter-clique topology with $O(n
+ log (n))$ edges seems to achieve the best compromise between
convergence speed and scalability with the number of nodes.
D-Cliques thus appears to be promising to reduce bandwidth
usage on FL servers and to implement fully decentralized alternatives in a
wider range of applications where global coordination is impossible or costly.
For instance, the presence and relative frequency of classes in each node
could be computed using PushSum~\cite{kempe2003gossip}, and the topology could
be constructed in a decentralized and adaptive way with
PeerSampling~\cite{jelasity2007gossip}. This will be investigated in future work.
We also believe that our ideas can be useful to deal
with more general types of data non-IIDness beyond the important case of
local class bias that we studied in this paper. An important example is
covariate shift or feature distribution skew \cite{kairouz2019advances}, where
local density estimates could be used as basis to construct cliques that
approximately
recover the global distribution.
%\section{Future Work}
%\begin{itemize}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment