Skip to content
Snippets Groups Projects
Commit 4e462656 authored by aurelien.bellet's avatar aurelien.bellet
Browse files

small update related work

parent dd2e4564
No related branches found
No related tags found
No related merge requests found
...@@ -658,28 +658,6 @@ we'll see} ...@@ -658,28 +658,6 @@ we'll see}
\aurelien{TODO: where to place TornadoAggregate and related refs?} \aurelien{TODO: where to place TornadoAggregate and related refs?}
\paragraph{Impact of topology in fully decentralized FL.} It is well
known
that the choice of network topology can affect the
convergence of fully decentralized algorithms: this is typically accounted
for in the theoretical convergence rate by a dependence on the spectral gap of
the network, see for instance
\cite{Duchi2012a,Colin2016a,lian2017d-psgd,Nedic18}.
However, for IID data, practice contradicts these classic
results: fully decentralized algorithms converge essentially as fast
on sparse topologies like rings or grids as they do on a fully connected
graph \cite{lian2017d-psgd,Lian2018}. Recent work
\cite{neglia2020,consensus_distance} sheds light on this phenomenon with refined convergence analyses based on differences between gradients or parameters across nodes, which are typically
smaller in the IID case. However, these results do not give any clear insight
regarding the role of the topology in the non-IID case. We note that some work
has gone into designing efficient topologies to optimize the use of
network resources (see e.g., \cite{marfoq}), but this is done independently
of how data is distributed across nodes. In summary, the role
of topology in the
non-IID data scenario is
not well understood and we are not aware of prior work focusing on this
question.
\paragraph{Dealing with non-IID data in server-based FL.} \paragraph{Dealing with non-IID data in server-based FL.}
Dealing with non-IID data in server-based FL has Dealing with non-IID data in server-based FL has
recently attracted a lot of interest. While non-IID data is not an issue if recently attracted a lot of interest. While non-IID data is not an issue if
...@@ -709,7 +687,8 @@ also observed that \cite{tang18a} is subject to numerical ...@@ -709,7 +687,8 @@ also observed that \cite{tang18a} is subject to numerical
instabilities when run on topologies other than rings and grids. When instabilities when run on topologies other than rings and grids. When
the rows and columns of $W$ do not exactly the rows and columns of $W$ do not exactly
sum to $1$ (due to finite precision), these small differences get amplified by sum to $1$ (due to finite precision), these small differences get amplified by
the proposed updates and make the algorithm diverge.}Z the proposed updates and make the algorithm diverge.}\aurelien{emphasize that
they only do small scale experiments}
% non-IID known to be a problem for fully decentralized FL. cf Jelasity paper % non-IID known to be a problem for fully decentralized FL. cf Jelasity paper
% D2 and other recent papers on modifying updates: Quasi-Global Momentum, % D2 and other recent papers on modifying updates: Quasi-Global Momentum,
% Cross-Gradient Aggregation % Cross-Gradient Aggregation
...@@ -732,6 +711,31 @@ that would otherwise bias the direction of the gradient. ...@@ -732,6 +711,31 @@ that would otherwise bias the direction of the gradient.
% with variance reduction) or multiple averaging steps. % with variance reduction) or multiple averaging steps.
\paragraph{Impact of topology in fully decentralized FL.} It is well
known
that the choice of network topology can affect the
convergence of fully decentralized algorithms: this is typically accounted
for in the theoretical convergence rate by a dependence on the spectral gap of
the network, see for instance
\cite{Duchi2012a,Colin2016a,lian2017d-psgd,Nedic18}.
However, for IID data, practice contradicts these classic
results: fully decentralized algorithms converge essentially as fast
on sparse topologies like rings or grids as they do on a fully connected
graph \cite{lian2017d-psgd,Lian2018}. Recent work
\cite{neglia2020,consensus_distance} sheds light on this phenomenon with refined convergence analyses based on differences between gradients or parameters across nodes, which are typically
smaller in the IID case. However, these results do not give any clear insight
regarding the role of the topology in the non-IID case. We note that some work
has gone into designing efficient topologies to optimize the use of
network resources (see e.g., \cite{marfoq}), but this is done independently
of how data is distributed across nodes. In summary, the role
of topology in the non-IID data scenario is not well understood and we are not
aware of prior work focusing on this question. Our work shows that an
appropriate choice of data-dependent topology can effectively compensate for
non-IID data.
\section{Conclusion}
%\section{Future Work} %\section{Future Work}
%\begin{itemize} %\begin{itemize}
% \item Non-uniform Class Representation % \item Non-uniform Class Representation
...@@ -741,8 +745,6 @@ that would otherwise bias the direction of the gradient. ...@@ -741,8 +745,6 @@ that would otherwise bias the direction of the gradient.
% \item Relaxing Clique Connectivity: Randomly choose a subset of clique neighbours to compute average gradient. % \item Relaxing Clique Connectivity: Randomly choose a subset of clique neighbours to compute average gradient.
%\end{itemize} %\end{itemize}
\section{Conclusion}
\section{Credits} \section{Credits}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment