Skip to content
Snippets Groups Projects
Commit f4513e69 authored by aurelien.bellet's avatar aurelien.bellet
Browse files

update abstract/intro to use heterogeneity

parent 0c409353
No related branches found
No related tags found
No related merge requests found
......@@ -19,15 +19,14 @@ confidentiality concerns~\cite{kairouz2019advances}.
Yet, working with natural data distributions introduces new challenges for
learning systems, as
local datasets
reflect the usage and production patterns specific to each participant: they are
\emph{not} independent and identically distributed
(non-IID). In the context of classification problems, the
relative
frequency of different classes of examples may significantly vary
across local datasets, a situation known as \emph{label distribution skew}
\cite{kairouz2019advances,quagmire}.
Therefore, one of the key challenges in FL is to design algorithms that
can efficiently deal with such non-IID data distributions
reflect the usage and production patterns specific to each participant: in
other words, they are
\emph{heterogeneous}. An important type of data heterogeneity encountered in
classification problems, known as \emph{label distribution skew}
\cite{kairouz2019advances,quagmire}, occurs when the frequency of different
classes of examples may vary significantly across local datasets.
One of the key challenges in FL is to design algorithms that
can efficiently deal with such heterogeneous data distributions
\cite{kairouz2019advances,fedprox,scaffold,quagmire}.
Federated learning algorithms can be classified into two categories depending
......@@ -46,13 +45,15 @@ generally scale better to the large number of participants seen in ``cross-devic
applications \cite{kairouz2019advances}. Effectively, while a central
server may quickly become a bottleneck as the number of participants increases, the topology used in fully decentralized algorithms can remain sparse
enough such that all participants need only to communicate with a small number of other participants, i.e. nodes have small (constant or logarithmic) degree
\cite{lian2017d-psgd}. For IID data, recent work has shown both empirically
\cite{lian2017d-psgd}. In the homogeneous setting where data is
independent and identically distributed (IID) across nodes, recent work
has shown both empirically
\cite{lian2017d-psgd,Lian2018} and theoretically \cite{neglia2020} that sparse
topologies like rings or grids
do not significantly affect the convergence
speed compared to using denser topologies.
\begin{figure*}[ht]
\begin{figure*}[t]
\centering
% From directory results/mnist
......@@ -60,7 +61,7 @@ speed compared to using denser topologies.
\begin{subfigure}[b]{0.25\textwidth}
\centering
\includegraphics[width=\textwidth]{../figures/ring-IID-vs-non-IID}
\caption{\label{fig:ring-IID-vs-non-IID} Ring}
\caption{\label{fig:ring-IID-vs-non-IID} Ring topology}
\end{subfigure}
\quad
% From directory results/mnist
......@@ -68,7 +69,7 @@ speed compared to using denser topologies.
\begin{subfigure}[b]{0.25\textwidth}
\centering
\includegraphics[width=\textwidth]{../figures/grid-IID-vs-non-IID}
\caption{\label{fig:grid-IID-vs-non-IID} Grid}
\caption{\label{fig:grid-IID-vs-non-IID} Grid topology}
\end{subfigure}
\quad
% From directory results/mnist
......@@ -76,25 +77,28 @@ speed compared to using denser topologies.
\begin{subfigure}[b]{0.25\textwidth}
\centering
\includegraphics[width=\textwidth]{../figures/fully-connected-IID-vs-non-IID}
\caption{\label{fig:fully-connected-IID-vs-non-IID} Fully-connected}
\caption{\label{fig:fully-connected-IID-vs-non-IID} Fully-connected topology}
\end{subfigure}
\caption{IID vs non-IID convergence speed of decentralized SGD for
logistic regression on
MNIST for different topologies. Bold lines show the average test
\caption{Convergence speed of decentralized
SGD with and without label distribution skew for different topologies.
The task is logistic regression on MNIST (see
Section~\ref{section:experimental-settings} for details on
the experimental setup). Bold lines show the
average test
accuracy across nodes
while thin lines show the minimum
and maximum accuracy of individual nodes. While the effect of topology
is negligible for IID data, it is very significant in the
non-IID case. When fully-connected, both cases converge similarly. See
Section~\ref{section:experimental-settings} for details on
the experimental setup.}
is negligible for homogeneous data, it is very significant in the
heterogeneous case. On a fully-connected network, both cases converge
similarly.}
\label{fig:iid-vs-non-iid-problem}
\end{figure*}
In contrast to the IID case however, our experiments demonstrate that \emph{the impact of topology is extremely significant for non-IID data}. This phenomenon is illustrated
in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
\todo{AB: update fig legend to not use (non)IID terms}
In contrast to the homogeneous case however, our experiments demonstrate that
\emph{the impact of topology is extremely significant for heterogeneous data}.
This phenomenon is illustrated in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
label distribution skew, using a
sparse topology (a ring or
a grid) clearly jeopardizes the convergence speed of decentralized SGD.
......@@ -113,7 +117,7 @@ Specifically, we make the following contributions:
(1) We propose D-Cliques, a sparse topology in which nodes are organized in
interconnected cliques, i.e. locally fully-connected sets of nodes, such that
the joint label distribution of each clique is close to that of the global
(IID) distribution; (2) We design a greedy algorithm for
distribution; (2) We design a greedy algorithm for
constructing such cliques efficiently;
% in the presence of heterogeneity previously studied
% in the context of Federated Learning~\cite{mcmahan2016communication};
......
......@@ -42,8 +42,8 @@
\begin{document}
\twocolumn[
\mlsystitle{D-Cliques: Compensating Data Heterogeneity with Topology in Decentralized
Federated Learning}
\mlsystitle{D-Cliques: Compensating for Data Heterogeneity with Topology in
Decentralized Federated Learning}
% It is OKAY to include author information, even for blind
% submissions: the style file will automatically remove it for you
......@@ -84,10 +84,9 @@ Non-IID Data, Stochastic Gradient Descent}
%Abstracts must be a single paragraph, ideally between 4--6 sentences long.
%Gross violations will trigger corrections at the camera-ready phase.
The convergence speed of machine learning models trained with Federated
Learning is significantly affected by non-independent and identically
distributed (non-IID) data partitions, even more so in a fully decentralized
setting without a central server. In this paper, we show that the impact of
label distribution skew, an important type of data non-IIDness, can be
Learning is significantly affected by heterogeneous data partitions, even more
so in a fully decentralized setting without a central server. In this paper, we show that the impact of
label distribution skew, an important type of data heterogeneity, can be
significantly reduced by carefully designing
the underlying communication topology. We present D-Cliques, a novel topology
that reduces gradient bias by grouping nodes in sparsely interconnected
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment