Federated learning algorithms can be classified into two categories depending
Federated learning algorithms can be classified into two categories depending
...
@@ -46,13 +45,15 @@ generally scale better to the large number of participants seen in ``cross-devic
...
@@ -46,13 +45,15 @@ generally scale better to the large number of participants seen in ``cross-devic
applications \cite{kairouz2019advances}. Effectively, while a central
applications \cite{kairouz2019advances}. Effectively, while a central
server may quickly become a bottleneck as the number of participants increases, the topology used in fully decentralized algorithms can remain sparse
server may quickly become a bottleneck as the number of participants increases, the topology used in fully decentralized algorithms can remain sparse
enough such that all participants need only to communicate with a small number of other participants, i.e. nodes have small (constant or logarithmic) degree
enough such that all participants need only to communicate with a small number of other participants, i.e. nodes have small (constant or logarithmic) degree
\cite{lian2017d-psgd}. For IID data, recent work has shown both empirically
\cite{lian2017d-psgd}. In the homogeneous setting where data is
independent and identically distributed (IID) across nodes, recent work
has shown both empirically
\cite{lian2017d-psgd,Lian2018} and theoretically \cite{neglia2020} that sparse
\cite{lian2017d-psgd,Lian2018} and theoretically \cite{neglia2020} that sparse
topologies like rings or grids
topologies like rings or grids
do not significantly affect the convergence
do not significantly affect the convergence
speed compared to using denser topologies.
speed compared to using denser topologies.
\begin{figure*}[ht]
\begin{figure*}[t]
\centering
\centering
% From directory results/mnist
% From directory results/mnist
...
@@ -60,7 +61,7 @@ speed compared to using denser topologies.
...
@@ -60,7 +61,7 @@ speed compared to using denser topologies.
\caption{IID vs non-IID convergence speed of decentralized SGD for
\caption{Convergence speed of decentralized
logistic regression on
SGD with and without label distribution skew for different topologies.
MNIST for different topologies. Bold lines show the average test
The task is logistic regression on MNIST (see
Section~\ref{section:experimental-settings} for details on
the experimental setup). Bold lines show the
average test
accuracy across nodes
accuracy across nodes
while thin lines show the minimum
while thin lines show the minimum
and maximum accuracy of individual nodes. While the effect of topology
and maximum accuracy of individual nodes. While the effect of topology
is negligible for IID data, it is very significant in the
is negligible for homogeneous data, it is very significant in the
non-IID case. When fully-connected, both cases converge similarly. See
heterogeneous case. On a fully-connected network, both cases converge
Section~\ref{section:experimental-settings} for details on
similarly.}
the experimental setup.}
\label{fig:iid-vs-non-iid-problem}
\label{fig:iid-vs-non-iid-problem}
\end{figure*}
\end{figure*}
\todo{AB: update fig legend to not use (non)IID terms}
In contrast to the IID case however, our experiments demonstrate that \emph{the impact of topology is extremely significant for non-IID data}. This phenomenon is illustrated
In contrast to the homogeneous case however, our experiments demonstrate that
in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
\emph{the impact of topology is extremely significant for heterogeneous data}.
This phenomenon is illustrated in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
label distribution skew, using a
label distribution skew, using a
sparse topology (a ring or
sparse topology (a ring or
a grid) clearly jeopardizes the convergence speed of decentralized SGD.
a grid) clearly jeopardizes the convergence speed of decentralized SGD.
...
@@ -113,7 +117,7 @@ Specifically, we make the following contributions:
...
@@ -113,7 +117,7 @@ Specifically, we make the following contributions:
(1) We propose D-Cliques, a sparse topology in which nodes are organized in
(1) We propose D-Cliques, a sparse topology in which nodes are organized in
interconnected cliques, i.e. locally fully-connected sets of nodes, such that
interconnected cliques, i.e. locally fully-connected sets of nodes, such that
the joint label distribution of each clique is close to that of the global
the joint label distribution of each clique is close to that of the global
(IID) distribution; (2) We design a greedy algorithm for
distribution; (2) We design a greedy algorithm for
constructing such cliques efficiently;
constructing such cliques efficiently;
% in the presence of heterogeneity previously studied
% in the presence of heterogeneity previously studied
% in the context of Federated Learning~\cite{mcmahan2016communication};
% in the context of Federated Learning~\cite{mcmahan2016communication};