diff --git a/main.tex b/main.tex index 787d9ea09cb17b553bab2473e8d88298de458fd2..1563e0786d92c826d3f5be1a6fde5699d8bdca1f 100644 --- a/main.tex +++ b/main.tex @@ -970,21 +970,22 @@ non-IID data. \label{section:conclusion} We proposed D-Cliques, a sparse topology that recovers the convergence -speed of a fully-connected topology in the presence of local class bias. -D-Cliques is based on assembling cliques of nodes such that their joint local -distribution is representative of the global distribution so as to locally -recover IIDness. Cliques are joined in a sparse inter-clique topology so that +speed of a fully-connected network in the presence of local class bias. +D-Cliques is based on assembling subsets of nodes into cliques such +that the clique-level class distribution is representative of the global +distribution, thereby locally recovering IIDness. Cliques are joined in a +sparse inter-clique topology so that they quickly converge to the same model. We proposed Clique Averaging to remove the non-IID bias in gradient computation by averaging gradients only with other nodes within the clique. Clique Averaging can in turn be used to implement unbiased momentum to recover the convergence speed usually only possible with IID mini-batches. Through our experiments, we showed that the clique structure of D-Cliques is critical in obtaining these -results and that a small-world inter-clique topology with $O(n -+ log (n))$ edges seems to achieve the best compromise between +results and that a small-world inter-clique topology with only $O(n ++ log (n))$ edges achieves the best compromise between convergence speed and scalability with the number of nodes. -D-Cliques thus appears to be promising to reduce bandwidth +D-Cliques thus appears to be very promising to reduce bandwidth usage on FL servers and to implement fully decentralized alternatives in a wider range of applications where global coordination is impossible or costly. For instance, the presence and relative frequency of classes in each node @@ -994,10 +995,9 @@ PeerSampling~\cite{jelasity2007gossip}. This will be investigated in future work We also believe that our ideas can be useful to deal with more general types of data non-IIDness beyond the important case of local class bias that we studied in this paper. An important example is -covariate shift or feature distribution skew \cite{kairouz2019advances}, where -local density estimates could be used as basis to construct cliques that -approximately -recover the global distribution. +covariate shift or feature distribution skew \cite{kairouz2019advances}, for +which local density estimates could be used as basis to construct cliques that +approximately recover the global distribution. %\section{Future Work} %\begin{itemize}