Skip to content
Snippets Groups Projects
Commit f4513e69 authored by aurelien.bellet's avatar aurelien.bellet
Browse files

update abstract/intro to use heterogeneity

parent 0c409353
No related branches found
No related tags found
No related merge requests found
...@@ -19,15 +19,14 @@ confidentiality concerns~\cite{kairouz2019advances}. ...@@ -19,15 +19,14 @@ confidentiality concerns~\cite{kairouz2019advances}.
Yet, working with natural data distributions introduces new challenges for Yet, working with natural data distributions introduces new challenges for
learning systems, as learning systems, as
local datasets local datasets
reflect the usage and production patterns specific to each participant: they are reflect the usage and production patterns specific to each participant: in
\emph{not} independent and identically distributed other words, they are
(non-IID). In the context of classification problems, the \emph{heterogeneous}. An important type of data heterogeneity encountered in
relative classification problems, known as \emph{label distribution skew}
frequency of different classes of examples may significantly vary \cite{kairouz2019advances,quagmire}, occurs when the frequency of different
across local datasets, a situation known as \emph{label distribution skew} classes of examples may vary significantly across local datasets.
\cite{kairouz2019advances,quagmire}. One of the key challenges in FL is to design algorithms that
Therefore, one of the key challenges in FL is to design algorithms that can efficiently deal with such heterogeneous data distributions
can efficiently deal with such non-IID data distributions
\cite{kairouz2019advances,fedprox,scaffold,quagmire}. \cite{kairouz2019advances,fedprox,scaffold,quagmire}.
Federated learning algorithms can be classified into two categories depending Federated learning algorithms can be classified into two categories depending
...@@ -46,13 +45,15 @@ generally scale better to the large number of participants seen in ``cross-devic ...@@ -46,13 +45,15 @@ generally scale better to the large number of participants seen in ``cross-devic
applications \cite{kairouz2019advances}. Effectively, while a central applications \cite{kairouz2019advances}. Effectively, while a central
server may quickly become a bottleneck as the number of participants increases, the topology used in fully decentralized algorithms can remain sparse server may quickly become a bottleneck as the number of participants increases, the topology used in fully decentralized algorithms can remain sparse
enough such that all participants need only to communicate with a small number of other participants, i.e. nodes have small (constant or logarithmic) degree enough such that all participants need only to communicate with a small number of other participants, i.e. nodes have small (constant or logarithmic) degree
\cite{lian2017d-psgd}. For IID data, recent work has shown both empirically \cite{lian2017d-psgd}. In the homogeneous setting where data is
independent and identically distributed (IID) across nodes, recent work
has shown both empirically
\cite{lian2017d-psgd,Lian2018} and theoretically \cite{neglia2020} that sparse \cite{lian2017d-psgd,Lian2018} and theoretically \cite{neglia2020} that sparse
topologies like rings or grids topologies like rings or grids
do not significantly affect the convergence do not significantly affect the convergence
speed compared to using denser topologies. speed compared to using denser topologies.
\begin{figure*}[ht] \begin{figure*}[t]
\centering \centering
% From directory results/mnist % From directory results/mnist
...@@ -60,7 +61,7 @@ speed compared to using denser topologies. ...@@ -60,7 +61,7 @@ speed compared to using denser topologies.
\begin{subfigure}[b]{0.25\textwidth} \begin{subfigure}[b]{0.25\textwidth}
\centering \centering
\includegraphics[width=\textwidth]{../figures/ring-IID-vs-non-IID} \includegraphics[width=\textwidth]{../figures/ring-IID-vs-non-IID}
\caption{\label{fig:ring-IID-vs-non-IID} Ring} \caption{\label{fig:ring-IID-vs-non-IID} Ring topology}
\end{subfigure} \end{subfigure}
\quad \quad
% From directory results/mnist % From directory results/mnist
...@@ -68,7 +69,7 @@ speed compared to using denser topologies. ...@@ -68,7 +69,7 @@ speed compared to using denser topologies.
\begin{subfigure}[b]{0.25\textwidth} \begin{subfigure}[b]{0.25\textwidth}
\centering \centering
\includegraphics[width=\textwidth]{../figures/grid-IID-vs-non-IID} \includegraphics[width=\textwidth]{../figures/grid-IID-vs-non-IID}
\caption{\label{fig:grid-IID-vs-non-IID} Grid} \caption{\label{fig:grid-IID-vs-non-IID} Grid topology}
\end{subfigure} \end{subfigure}
\quad \quad
% From directory results/mnist % From directory results/mnist
...@@ -76,25 +77,28 @@ speed compared to using denser topologies. ...@@ -76,25 +77,28 @@ speed compared to using denser topologies.
\begin{subfigure}[b]{0.25\textwidth} \begin{subfigure}[b]{0.25\textwidth}
\centering \centering
\includegraphics[width=\textwidth]{../figures/fully-connected-IID-vs-non-IID} \includegraphics[width=\textwidth]{../figures/fully-connected-IID-vs-non-IID}
\caption{\label{fig:fully-connected-IID-vs-non-IID} Fully-connected} \caption{\label{fig:fully-connected-IID-vs-non-IID} Fully-connected topology}
\end{subfigure} \end{subfigure}
\caption{IID vs non-IID convergence speed of decentralized SGD for \caption{Convergence speed of decentralized
logistic regression on SGD with and without label distribution skew for different topologies.
MNIST for different topologies. Bold lines show the average test The task is logistic regression on MNIST (see
Section~\ref{section:experimental-settings} for details on
the experimental setup). Bold lines show the
average test
accuracy across nodes accuracy across nodes
while thin lines show the minimum while thin lines show the minimum
and maximum accuracy of individual nodes. While the effect of topology and maximum accuracy of individual nodes. While the effect of topology
is negligible for IID data, it is very significant in the is negligible for homogeneous data, it is very significant in the
non-IID case. When fully-connected, both cases converge similarly. See heterogeneous case. On a fully-connected network, both cases converge
Section~\ref{section:experimental-settings} for details on similarly.}
the experimental setup.}
\label{fig:iid-vs-non-iid-problem} \label{fig:iid-vs-non-iid-problem}
\end{figure*} \end{figure*}
\todo{AB: update fig legend to not use (non)IID terms}
In contrast to the IID case however, our experiments demonstrate that \emph{the impact of topology is extremely significant for non-IID data}. This phenomenon is illustrated In contrast to the homogeneous case however, our experiments demonstrate that
in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under \emph{the impact of topology is extremely significant for heterogeneous data}.
This phenomenon is illustrated in Figure~\ref{fig:iid-vs-non-iid-problem}: we observe that under
label distribution skew, using a label distribution skew, using a
sparse topology (a ring or sparse topology (a ring or
a grid) clearly jeopardizes the convergence speed of decentralized SGD. a grid) clearly jeopardizes the convergence speed of decentralized SGD.
...@@ -113,7 +117,7 @@ Specifically, we make the following contributions: ...@@ -113,7 +117,7 @@ Specifically, we make the following contributions:
(1) We propose D-Cliques, a sparse topology in which nodes are organized in (1) We propose D-Cliques, a sparse topology in which nodes are organized in
interconnected cliques, i.e. locally fully-connected sets of nodes, such that interconnected cliques, i.e. locally fully-connected sets of nodes, such that
the joint label distribution of each clique is close to that of the global the joint label distribution of each clique is close to that of the global
(IID) distribution; (2) We design a greedy algorithm for distribution; (2) We design a greedy algorithm for
constructing such cliques efficiently; constructing such cliques efficiently;
% in the presence of heterogeneity previously studied % in the presence of heterogeneity previously studied
% in the context of Federated Learning~\cite{mcmahan2016communication}; % in the context of Federated Learning~\cite{mcmahan2016communication};
......
...@@ -42,8 +42,8 @@ ...@@ -42,8 +42,8 @@
\begin{document} \begin{document}
\twocolumn[ \twocolumn[
\mlsystitle{D-Cliques: Compensating Data Heterogeneity with Topology in Decentralized \mlsystitle{D-Cliques: Compensating for Data Heterogeneity with Topology in
Federated Learning} Decentralized Federated Learning}
% It is OKAY to include author information, even for blind % It is OKAY to include author information, even for blind
% submissions: the style file will automatically remove it for you % submissions: the style file will automatically remove it for you
...@@ -84,10 +84,9 @@ Non-IID Data, Stochastic Gradient Descent} ...@@ -84,10 +84,9 @@ Non-IID Data, Stochastic Gradient Descent}
%Abstracts must be a single paragraph, ideally between 4--6 sentences long. %Abstracts must be a single paragraph, ideally between 4--6 sentences long.
%Gross violations will trigger corrections at the camera-ready phase. %Gross violations will trigger corrections at the camera-ready phase.
The convergence speed of machine learning models trained with Federated The convergence speed of machine learning models trained with Federated
Learning is significantly affected by non-independent and identically Learning is significantly affected by heterogeneous data partitions, even more
distributed (non-IID) data partitions, even more so in a fully decentralized so in a fully decentralized setting without a central server. In this paper, we show that the impact of
setting without a central server. In this paper, we show that the impact of label distribution skew, an important type of data heterogeneity, can be
label distribution skew, an important type of data non-IIDness, can be
significantly reduced by carefully designing significantly reduced by carefully designing
the underlying communication topology. We present D-Cliques, a novel topology the underlying communication topology. We present D-Cliques, a novel topology
that reduces gradient bias by grouping nodes in sparsely interconnected that reduces gradient bias by grouping nodes in sparsely interconnected
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment