AMK fixes

43cd9379 · aurelien.bellet · 2be9a4e9 · 43cd9379
Commit 43cd9379 authored 3 years ago by aurelien.bellet
--- a/main.tex
+++ b/main.tex
@@ -348,7 +348,7 @@ CIFAR10~\cite{krizhevsky2009learning}, which both have $c=10$ classes.
 For MNIST, we use 45k and 10k examples from the original 60k
 training set for training and validation respectively. The remaining 5k
 training examples were randomly removed to ensure all 10 classes are balanced
-while ensuring the dataset is evenly divisible across 100 and 1000 nodes.
+while ensuring that the dataset is evenly divisible across 100 and 1000 nodes.
 We use all 10k examples of
 the test set to measure prediction accuracy. For CIFAR10, classes are evenly
 balanced: we use 45k/50k images of the original training set for training,
@@ -381,7 +381,8 @@ been sampled by a node, i.e. an \textit{epoch}. This is equivalent to the classi
 To further make results comparable across different number of nodes, we lower
 the batch size proportionally to the number of nodes added, and inversely,
 e.g. on MNIST, 128 with 100 nodes vs. 13 with 1000 nodes. This
-ensures the same number of model updates and averaging per epoch, which is
+ensures that the number of model updates and averaging per epoch remains the
+same, which is
 important to have a fair comparison.\footnote{Updating and averaging models
 after every example can eliminate the impact of local class bias. However, the
 resulting communication overhead is impractical.}
@@ -543,9 +544,10 @@ introduced by inter-clique edges. We address this issue in the next section.
 \section{Optimizing with Clique Averaging and Momentum}
 \label{section:clique-averaging-momentum}
-In this sectio, we present Clique Averaging, a simple modification of D-SGD
+In this section, we present Clique Averaging, a feature that we add to
-which removes the bias caused by the inter-cliques edges of
+D-SGD in order to remove the bias caused by the inter-cliques edges of
-D-Cliques, and show how this can be used to successfully implement momentum
+D-Cliques. We then show how this can be used to successfully implement
+momentum
 for non-IID data.
 %AMK: check
@@ -802,7 +804,7 @@ average shortest path to $2$ between any pair of nodes. This choice requires $
 in the number of nodes. This can become significant at larger scales when $n$ is
 large compared to $c$.
-In this last series of experiment, we evaluate the effect of choosing sparser
+In this last series of experiments, we evaluate the effect of choosing sparser
 inter-clique topologies on the convergence speed for a larger network of 1000
 nodes. We compare the scalability and convergence speed of several
 D-Cliques variants, which all use $O(nc)$ edges