small update related work

4e462656 · aurelien.bellet · dd2e4564 · 4e462656
Commit 4e462656 authored 4 years ago by aurelien.bellet
--- a/main.tex
+++ b/main.tex
@@ -658,28 +658,6 @@ we'll see}
 \aurelien{TODO: where to place TornadoAggregate and related refs?}
-\paragraph{Impact of topology in fully decentralized FL.} It is well
-known
-that the choice of network topology can affect the
-convergence of fully decentralized algorithms: this is typically accounted
-for in the theoretical convergence rate by a dependence on the spectral gap of
-the network, see for instance 
-\cite{Duchi2012a,Colin2016a,lian2017d-psgd,Nedic18}.
-However, for IID data, practice contradicts these classic
-results: fully decentralized algorithms converge essentially as fast
-on sparse topologies like rings or grids as they do on a fully connected
-graph \cite{lian2017d-psgd,Lian2018}. Recent work 
-\cite{neglia2020,consensus_distance} sheds light on this phenomenon with refined convergence analyses based on differences between gradients or parameters across nodes, which are typically
-smaller in the IID case. However, these results do not give any clear insight
-regarding the role of the topology in the non-IID case. We note that some work
-has gone into designing efficient topologies to optimize the use of
-network resources (see e.g., \cite{marfoq}), but this is done independently
-of how data is distributed across nodes. In summary, the role
-of topology in the
-non-IID data scenario is
-not well understood and we are not aware of prior work focusing on this
-question.
 \paragraph{Dealing with non-IID data in server-based FL.}
 Dealing with non-IID data in server-based FL has
 recently attracted a lot of interest. While non-IID data is not an issue if
@@ -709,7 +687,8 @@ also observed that \cite{tang18a} is subject to numerical
 instabilities when run on topologies other than rings and grids. When
 the rows and columns of $W$ do not exactly
 sum to $1$ (due to finite precision), these small differences get amplified by
-the proposed updates and make the algorithm diverge.}Z
+the proposed updates and make the algorithm diverge.}\aurelien{emphasize that
+they only do small scale experiments}
 % non-IID known to be a problem for fully decentralized FL. cf Jelasity paper
 % D2 and other recent papers on modifying updates: Quasi-Global Momentum,
 % Cross-Gradient Aggregation
@@ -732,6 +711,31 @@ that would otherwise bias the direction of the gradient.
 % with variance reduction) or multiple averaging steps.
+\paragraph{Impact of topology in fully decentralized FL.} It is well
+known
+that the choice of network topology can affect the
+convergence of fully decentralized algorithms: this is typically accounted
+for in the theoretical convergence rate by a dependence on the spectral gap of
+the network, see for instance 
+\cite{Duchi2012a,Colin2016a,lian2017d-psgd,Nedic18}.
+However, for IID data, practice contradicts these classic
+results: fully decentralized algorithms converge essentially as fast
+on sparse topologies like rings or grids as they do on a fully connected
+graph \cite{lian2017d-psgd,Lian2018}. Recent work 
+\cite{neglia2020,consensus_distance} sheds light on this phenomenon with refined convergence analyses based on differences between gradients or parameters across nodes, which are typically
+smaller in the IID case. However, these results do not give any clear insight
+regarding the role of the topology in the non-IID case. We note that some work
+has gone into designing efficient topologies to optimize the use of
+network resources (see e.g., \cite{marfoq}), but this is done independently
+of how data is distributed across nodes. In summary, the role
+of topology in the non-IID data scenario is not well understood and we are not
+aware of prior work focusing on this question. Our work shows that an
+appropriate choice of data-dependent topology can effectively compensate for
+non-IID data.
+\section{Conclusion}
 %\section{Future Work}
 %\begin{itemize}
 %  \item Non-uniform Class Representation
@@ -741,8 +745,6 @@ that would otherwise bias the direction of the gradient.
 %  \item Relaxing Clique Connectivity: Randomly choose a subset of clique neighbours to compute average gradient.
 %\end{itemize}
-\section{Conclusion}
 \section{Credits}