-
aurelien.bellet authored
change local class bias to label distribution skew to unify vocabulary and follow existing terminology
aurelien.bellet authoredchange local class bias to label distribution skew to unify vocabulary and follow existing terminology
conclu.tex 1.97 KiB
% !TEX root = main.tex
\section{Conclusion}
\label{section:conclusion}
We proposed D-Cliques, a sparse topology that recovers the convergence
speed of a fully-connected network in the presence of label distribution skew.
D-Cliques is based on assembling subsets of nodes into cliques such
that the clique-level class distribution is representative of the global
distribution, thereby locally recovering IIDness. Cliques are joined in a
sparse inter-clique topology so that
they quickly converge to the same model. We proposed Clique
Averaging to remove the non-IID bias in gradient computation by
averaging gradients only with other nodes within the clique. Clique Averaging
can in turn be used to implement unbiased momentum to recover the convergence
speed usually only possible with IID mini-batches. Through our experiments, we
showed that the clique structure of D-Cliques is critical in obtaining these
results and that a small-world inter-clique topology with only $O(n \log (n))$
edges achieves the best compromise between
convergence speed and scalability with the number of nodes.
D-Cliques thus appears to be very promising to reduce bandwidth
usage on FL servers and to implement fully decentralized alternatives in a
wider range of applications where global coordination is impossible or costly.
For instance, the presence and relative frequency of classes in each node
could be computed using PushSum~\cite{kempe2003gossip}, and the topology could
be constructed in a decentralized and adaptive way with
PeerSampling~\cite{jelasity2007gossip}. This will be investigated in future work.
We also believe that our ideas can be useful to deal
with more general types of data non-IIDness beyond the important case of
label distribution skew on which we focused in this paper. An important
example is
covariate shift or feature distribution skew \cite{kairouz2019advances}, for
which local density estimates could be used as basis to construct cliques that
approximately recover the global distribution.