@@ -137,8 +137,11 @@ unbiased with respect to the class distribution.
...
@@ -137,8 +137,11 @@ unbiased with respect to the class distribution.
We empirically evaluate our approach on MNIST and CIFAR10 datasets using
We empirically evaluate our approach on MNIST and CIFAR10 datasets using
logistic
logistic
regression and deep convolutional models with up to 1000 participants. This is
regression and deep convolutional models with up to 1000 participants. This is
in contrast to most previous work on fully decentralized algorithms which only
in contrast to most previous work on fully decentralized algorithms
consider a few tens of participants \cite{refs}.
considering only a few tens of participants \cite{tang18a,more_refs}, which
fall short of
giving a realistic view of the performance of these algorithms in actual
applications.
\aurelien{TODO: complete above paragraph with more details and highlighting
\aurelien{TODO: complete above paragraph with more details and highlighting
other contributions as needed}
other contributions as needed}
...
@@ -614,7 +617,8 @@ network, see for instance \cite{Duchi2012a,lian2017d-psgd,Nedic18}.
...
@@ -614,7 +617,8 @@ network, see for instance \cite{Duchi2012a,lian2017d-psgd,Nedic18}.
% papers using multiple averaging steps
% papers using multiple averaging steps
% also our personalized papers
% also our personalized papers
D2: numerically unstable when $W_{ij}$ rows and columns do not exactly sum to $1$, as the small differences are amplified in a positive feedback loop. More work is therefore required on the algorithm to make it usable with a wider variety of topologies. In comparison, D-cliques do not modify the SGD algorithm and instead simply removes some neighbor contributions that would otherwise bias the direction of the gradient. D-Cliques with D-PSGD are therefore as tolerant to ill-conditioned $W_{ij}$ matrices as regular D-PSGD in an IID setting.
D2 \cite{tang18a}: numerically unstable when $W_{ij}$ rows and columns do not exactly
sum to $1$, as the small differences are amplified in a positive feedback loop. More work is therefore required on the algorithm to make it usable with a wider variety of topologies. In comparison, D-cliques do not modify the SGD algorithm and instead simply removes some neighbor contributions that would otherwise bias the direction of the gradient. D-Cliques with D-PSGD are therefore as tolerant to ill-conditioned $W_{ij}$ matrices as regular D-PSGD in an IID setting.
An originality of our approach is to focus on the effect of topology
An originality of our approach is to focus on the effect of topology
level without significantly changing the original simple and efficient D-SGD
level without significantly changing the original simple and efficient D-SGD