Federated learning algorithms can be classified into two categories depending
...
...
@@ -46,13 +45,15 @@ generally scale better to the large number of participants seen in ``cross-devic
applications \cite{kairouz2019advances}. Effectively, while a central
server may quickly become a bottleneck as the number of participants increases, the topology used in fully decentralized algorithms can remain sparse
enough such that all participants need only to communicate with a small number of other participants, i.e. nodes have small (constant or logarithmic) degree
\cite{lian2017d-psgd}. For IID data, recent work has shown both empirically
\cite{lian2017d-psgd}. In the homogeneous setting where data is
independent and identically distributed (IID) across nodes, recent work
has shown both empirically
\cite{lian2017d-psgd,Lian2018} and theoretically \cite{neglia2020} that sparse
topologies like rings or grids
do not significantly affect the convergence
speed compared to using denser topologies.
\begin{figure*}[ht]
\begin{figure*}[t]
\centering
% From directory results/mnist
...
...
@@ -60,7 +61,7 @@ speed compared to using denser topologies.
\caption{IID vs non-IID convergence speed of decentralized SGD for
logistic regression on
MNIST for different topologies. Bold lines show the average test
\caption{Convergence speed of decentralized
SGD with and without label distribution skew for different topologies.
The task is logistic regression on MNIST (see
Section~\ref{section:experimental-settings} for details on
the experimental setup). Bold lines show the
average test
accuracy across nodes
while thin lines show the minimum
and maximum accuracy of individual nodes. While the effect of topology
is negligible for IID data, it is very significant in the
non-IID case. When fully-connected, both cases converge similarly. See
Section~\ref{section:experimental-settings} for details on
the experimental setup.}
is negligible for homogeneous data, it is very significant in the
heterogeneous case. On a fully-connected network, both cases converge
similarly.}
\label{fig:iid-vs-non-iid-problem}
\end{figure*}
In contrast to the IID case however, our experiments demonstrate that \emph{the impact of topology is extremely significant for non-IID data}. This phenomenon is illustrated
in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
\todo{AB: update fig legend to not use (non)IID terms}
In contrast to the homogeneous case however, our experiments demonstrate that
\emph{the impact of topology is extremely significant for heterogeneous data}.
This phenomenon is illustrated in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
label distribution skew, using a
sparse topology (a ring or
a grid) clearly jeopardizes the convergence speed of decentralized SGD.
...
...
@@ -113,7 +117,7 @@ Specifically, we make the following contributions:
(1) We propose D-Cliques, a sparse topology in which nodes are organized in
interconnected cliques, i.e. locally fully-connected sets of nodes, such that
the joint label distribution of each clique is close to that of the global
(IID) distribution; (2) We design a greedy algorithm for
distribution; (2) We design a greedy algorithm for
constructing such cliques efficiently;
% in the presence of heterogeneity previously studied
% in the context of Federated Learning~\cite{mcmahan2016communication};