Skip to content
Snippets Groups Projects
Commit 360accf0 authored by Erick Lavoie's avatar Erick Lavoie
Browse files

Minor edits to fit the Acknowledgement section

parent a61f6d43
No related branches found
No related tags found
No related merge requests found
......@@ -43,10 +43,7 @@ nodes, non-uniform latency between pairs of nodes) and dynamically constructing
% be constructed in a decentralized way with
% PeerSampling~\cite{jelasity2007gossip}. This will be investigated in future work,
% using these results as a design foundation.
More general types of data heterogeneity could be tackled beyond the
important case
of
label distribution skew on which we focused in this paper. An important
More general types of data heterogeneity could be tackled: An important
example is
covariate shift or feature distribution skew \cite{kairouz2019advances}, for
which local density estimates could be used as basis to construct cliques that
......
......@@ -50,13 +50,13 @@ impractical.
\begin{figure}[t]
\centering
\begin{subfigure}[b]{0.18\textwidth}
\begin{subfigure}[b]{0.16\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/grid-iid-neighbourhood}
\caption{\label{fig:grid-iid-neighbourhood} Homogeneous data}
\end{subfigure}
\hspace*{.5cm}
\begin{subfigure}[b]{0.18\textwidth}
\begin{subfigure}[b]{0.16\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/grid-non-iid-neighbourhood}
\caption{\label{fig:grid-non-iid-neighbourhood} Heterogeneous data}
......@@ -316,7 +316,7 @@ log-linear in $n$.
\begin{figure}[t]
\centering
\includegraphics[width=0.20\textwidth]{figures/fully-connected-cliques}
\includegraphics[width=0.18\textwidth]{figures/fully-connected-cliques}
\caption{\label{fig:d-cliques-figure} D-Cliques with $n=100$, $M=10$ and a
fully connected inter-clique topology on a problem with 1 class/node.}
\end{figure}
......@@ -414,7 +414,7 @@ of the local models across nodes.
\begin{figure}[t]
\centering
\includegraphics[width=0.3\textwidth]{figures/connected-cliques-bias}
\includegraphics[width=0.28\textwidth]{figures/connected-cliques-bias}
\caption{\label{fig:connected-cliques-bias} Illustrating the bias induced by
inter-clique connections (see main text for details).}
\end{figure}
......
......@@ -618,7 +618,7 @@ slower.
%python $TOOLS/analyze/filter.py skews --topology:name d-cliques/greedy-swap | python $TOOLS/plot/skew/convergence.py --max-steps 400 --labels 'MNIST (unbalanced classes)' 'CIFAR10 (balanced classes)' --linewidth 2.5 --save-figure ../mlsys2022style/figures/skew-convergence-speed-2-shards.png --linestyles 'solid' 'dashed'
\begin{figure}[htbp]
\centering
\includegraphics[width=0.25\textwidth]{figures/skew-convergence-speed-2-shards}
\includegraphics[width=0.23\textwidth]{figures/skew-convergence-speed-2-shards}
\caption{\label{fig:skew-convergence-speed-2-shards} Skew decrease during clique construction of 10 cliques of 10 heterogeneous nodes (100 nodes). Bold line is the average over 100 experiments. Thin lines are respectively the minimum and maximum over all experiments. In wall-clock time, 1000 steps take less than 6 seconds in Python 3.7 on a MacBook Pro 2020.}
\end{figure}
......@@ -659,5 +659,4 @@ is therefore critical to convergence speed.
In an extended version of this paper~\cite{dcliques-arxiv}, we replicate experimental
results on an extreme case of label distribution skew where each node only has
examples of a single class. These results consistently show that our
approach remains effective even for extremely skewed label distributions
across nodes.
approach remains effective even for extremely skewed label distributions.
......@@ -144,6 +144,11 @@ Heterogeneous Data, Stochastic Gradient Descent
\input{related_work}
\input{conclu}
\section*{Acknowledgements}
This work was supported in part by the French National Research Agency
(ANR) through grant ANR-20-CE23-0015 (Project PRIDE) and ANR-16-CE23-0016-01 (Project PAMELA).
\bibliographystyle{IEEEtran}
\bibliography{IEEEabrv,main.bib}
......
......@@ -10,9 +10,9 @@ algorithms.
\paragraph{Dealing with heterogeneity in server-based FL}
Data heterogeneity is not much of an issue in server-based FL if
clients send their parameters to the server after each gradient update.
Problems arise when one seeks to reduce
the number of communication rounds by allowing each participant to perform
multiple local updates, as in the popular FedAvg algorithm
Problems arise when participants perform
multiple local updates to reduce
the number of communication rounds, as in the popular FedAvg algorithm
\cite{mcmahan2016communication}. Indeed, data heterogeneity can prevent
such algorithms from
converging to a good solution \cite{quagmire,scaffold}. This led to the design
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment