diff --git a/main.tex b/main.tex
index 8291dad718347c5236534669e641266eed997dca..76bf550f1cf4303f5294a49705f91354af234a8c 100644
--- a/main.tex
+++ b/main.tex
@@ -545,7 +545,7 @@ introduced by inter-clique edges. We address this issue in the next section.
 \label{section:clique-averaging-momentum}
 
 In this section, we present Clique Averaging, a feature that we add to
-D-SGD in order to remove the bias caused by the inter-cliques edges of
+D-SGD to remove the bias caused by the inter-cliques edges of
 D-Cliques. We then show how this can be used to successfully implement
 momentum
 for non-IID data.
@@ -746,9 +746,10 @@ convergence.
 Crucially, all random topologies fail to converge to a good solution. This
 confirms that our clique structure is important to reduce variance
 across nodes and improve the convergence. The difference with the previous
-experiment seems to be due to both the use of a higher capacity model and to
-the intrinsic characteristics of the datasets. We refer
-to the appendix for results on MNIST with LeNet.
+experiment appears to be due to both the use of a higher capacity model and to
+the intrinsic characteristics of the datasets.
+% We refer
+% to the appendix for results on MNIST with LeNet.
 % We have tried to use LeNet on
 % MNIST to see if the difference between MNIST and CIFAR10 could be attributed to the capacity difference between the Linear and Convolutional networks, whose optimization may benefit from clustering (see Appendix). The difference is less dramatic than for CIFAR10, so it must be that the dataset also has an impact. The exact nature of it is still an open question.
 
@@ -853,7 +854,10 @@ fully-connected topology offers
 significant benefits with 1000 nodes, as it represents a 98\% reduction in the
 number of edges compared to fully connecting individual nodes (18.9 edges on
 average instead of 999) and a 96\% reduction in the number of messages (37.8
-messages per round per node on average instead of 999). Overall, these results
+messages per round per node on average instead of 999). We refer to
+Appendix~\ref{app:scaling} for additional results comparing the convergence
+speed across
+different number of nodes. Overall, our results
 show that D-Cliques can nicely scale with the number of nodes.
 
 \begin{figure}[t]
@@ -1028,11 +1032,11 @@ approximately recover the global distribution.
  \section{Detailed Algorithms}
  
  We present a more detailed and precise explanation of the two main algorithms
- of the paper, for D-Clique Construction (Algorithm~
- \ref{Algorithm:D-Clique-Construction}) and to establish a small-world
+ of the paper, for D-Cliques construction
+ (Algorithm~\ref{Algorithm:D-Clique-Construction}) and to establish a small-world
  inter-clique topology (Algorithm~\ref{Algorithm:Smallworld}).
  
- \subsection{D-Clique Construction}
+ \subsection{D-Cliques Construction}
  
  Algorithm~\ref{Algorithm:D-Clique-Construction} shows the overall approach
  for constructing a D-Cliques topology in the non-IID case.\footnote{An IID
@@ -1041,7 +1045,7 @@ approximately recover the global distribution.
  It expects the following inputs: $L$, the set of all classes present in the global distribution $D = \bigcup_{i \in N} D_i$; $N$, the set of all nodes; a function $classes(S)$, which given a subset $S$ of nodes in $N$ returns the set of classes in their joint local distributions ($D_S = \bigcup_{i \in S} D_i$); a function $intraconnect(DC)$, which given $DC$, a set of cliques (set of set of nodes), creates a set of edges ($\{\{i,j\}, \dots \}$) connecting all nodes within each clique to one another; a function $interconnect(DC)$, which given a set of cliques, creates a set of edges ($\{\{i,j\}, \dots \}$) connecting nodes belonging to different cliques; and a function $weigths(E)$, which given a set of edges, returns the weighted matrix $W_{ij}$.  Algorithm~\ref{Algorithm:D-Clique-Construction} returns both $W_{ij}$, for use in D-SGD (Algorithm~\ref{Algorithm:D-PSGD} and~\ref{Algorithm:Clique-Unbiased-D-PSGD}), and $DC$, for use with Clique Averaging (Algorithm~\ref{Algorithm:Clique-Unbiased-D-PSGD}).
  
    \begin{algorithm}[h]
-   \caption{D-Clique Construction}
+   \caption{D-Cliques Construction}
    \label{Algorithm:D-Clique-Construction}
    \begin{algorithmic}[1]
         \State \textbf{Require:} set of classes globally present $L$, 
@@ -1120,74 +1124,74 @@ The implementation first arranges the cliques of $DC$ in a list, which
 represents the ring. Traversing the list with increasing indices is equivalent
 to traversing the ring in the clockwise direction, and inversely. Then, for every clique $i$ on the ring from which we are computing the distance to others, a number of edges are added. All other cliques are implicitly arranged in mutually exclusive sets, with size and at offset exponentially bigger (doubling at every step). Then for every of these sets, $ns$ edges are added, both in the clockwise and counter-clockwise directions, always on the nodes with the least number of edges in each clique. The ring edges are implicitly added to the cliques at offset $1$ in both directions.
  
- \section{Additional Experiments}
-
-\subsection{Effect of Clique Averaging and Uniform Initialization}
-
-Section~\ref{section:clique-averaging} explained how Clique Averaging reduces bias and showed that Clique Averaging was significantly beneficial on MNIST with fully-connected D-Cliques. In this section, we provide additional results for the ring topology, as well as for CIFAR10. In addition, during our early exploration, we noticed that ensuring \textit{uniform initialization}, i.e. ensuring that all nodes start with the same model, increased convergence speed when connecting two cliques with 1-2 interclique edges. We therefore also verify whether this effect is still significant with 10 cliques (100 nodes), on a ring and with full connections between cliques, as well as on MNIST and CIFAR10. We also verified what interaction this had with Clique Averaging.
-
-Figure~\ref{fig:d-cliques-mnist-init-clique-avg-effect} shows all the results for MNIST. Comparing Figure~\ref{fig:d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy} to~\ref{fig:d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy}, and Figure~\ref{fig:d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy} to~\ref{fig:d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy} together, we see that Uniform Initialization has imperceptible effects. However, for all four sub-figures, using Clique Averaging has a slightly better average convergence speed, and significantly lower variance between nodes, than not using it. Moreover, the improvement is larger with Fully-Connected D-Cliques.
-
-\begin{figure}[htbp]
-     \centering
-      % To regenerate the figure, from directory results/mnist   
-      % python ../../../learn-topology/tools/plot_convergence.py clique-ring/all/2021-03-10-18:14:35-CET no-clique-avg/clique-ring/all/2021-03-12-10:40:37-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.' 'without clique avg.' --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy.png
-      \begin{subfigure}[b]{0.48\textwidth}
-         \centering
-         \includegraphics[width=\textwidth]{figures/d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy}
-         \caption{\label{fig:d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy} D-Cliques (Ring), with Uniform Initialization}
-     \end{subfigure}
-     \quad 
-     % To regenerate the figure, from directory results/mnist
-     % python ../../../learn-topology/tools/plot_convergence.py no-init/clique-ring/all/2021-03-12-10:40:11-CET no-init-no-clique-avg/clique-ring/all/2021-03-12-10:41:03-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.' 'without clique avg.' --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy.png   
-      \begin{subfigure}[b]{0.48\textwidth}
-         \centering
-         \includegraphics[width=\textwidth]{figures/d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy}
-         \caption{\label{fig:d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy} D-Cliques (Ring), without Uniform Initialization}
-     \end{subfigure}
+ % \section{Additional Experiments}
+
+% \subsection{Effect of Clique Averaging and Uniform Initialization}
+
+% Section~\ref{section:clique-averaging} explained how Clique Averaging reduces bias and showed that Clique Averaging was significantly beneficial on MNIST with fully-connected D-Cliques. In this section, we provide additional results for the ring topology, as well as for CIFAR10. In addition, during our early exploration, we noticed that ensuring \textit{uniform initialization}, i.e. ensuring that all nodes start with the same model, increased convergence speed when connecting two cliques with 1-2 interclique edges. We therefore also verify whether this effect is still significant with 10 cliques (100 nodes), on a ring and with full connections between cliques, as well as on MNIST and CIFAR10. We also verified what interaction this had with Clique Averaging.
+
+% Figure~\ref{fig:d-cliques-mnist-init-clique-avg-effect} shows all the results for MNIST. Comparing Figure~\ref{fig:d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy} to~\ref{fig:d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy}, and Figure~\ref{fig:d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy} to~\ref{fig:d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy} together, we see that Uniform Initialization has imperceptible effects. However, for all four sub-figures, using Clique Averaging has a slightly better average convergence speed, and significantly lower variance between nodes, than not using it. Moreover, the improvement is larger with Fully-Connected D-Cliques.
+
+% \begin{figure}[htbp]
+%      \centering
+%       % To regenerate the figure, from directory results/mnist   
+%       % python ../../../learn-topology/tools/plot_convergence.py clique-ring/all/2021-03-10-18:14:35-CET no-clique-avg/clique-ring/all/2021-03-12-10:40:37-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.' 'without clique avg.' --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy.png
+%       \begin{subfigure}[b]{0.48\textwidth}
+%          \centering
+%          \includegraphics[width=\textwidth]{figures/d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy}
+%          \caption{\label{fig:d-cliques-mnist-init-clique-avg-effect-ring-test-accuracy} D-Cliques (Ring), with Uniform Initialization}
+%      \end{subfigure}
+%      \quad 
+%      % To regenerate the figure, from directory results/mnist
+%      % python ../../../learn-topology/tools/plot_convergence.py no-init/clique-ring/all/2021-03-12-10:40:11-CET no-init-no-clique-avg/clique-ring/all/2021-03-12-10:41:03-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.' 'without clique avg.' --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy.png   
+%       \begin{subfigure}[b]{0.48\textwidth}
+%          \centering
+%          \includegraphics[width=\textwidth]{figures/d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy}
+%          \caption{\label{fig:d-cliques-mnist-no-init-clique-avg-effect-ring-test-accuracy} D-Cliques (Ring), without Uniform Initialization}
+%      \end{subfigure}
      
-     % To regenerate the figure, from directory results/mnist
-     %python ../../../learn-topology/tools/plot_convergence.py fully-connected-cliques/all/2021-03-10-10:19:44-CET no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:26-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.'    'without clique avg.'  --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy.png
-       \begin{subfigure}[b]{0.48\textwidth}
-         \centering
-         \includegraphics[width=\textwidth]{figures/d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy}
-         \caption{\label{fig:d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy} D-Cliques (Fully-Connected), with Uniform Initialization}
-     \end{subfigure}
-     \quad
-      % To regenerate the figure, from directory results/mnist
-     %python ../../../learn-topology/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-12-11:12:01-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:49-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.' 'without clique avg.' --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy.png
-       \begin{subfigure}[b]{0.48\textwidth}
-         \centering
-         \includegraphics[width=\textwidth]{figures/d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy}
-         \caption{\label{fig:d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy} D-Cliques (Fully-Connected), without Uniform Initialization}
-     \end{subfigure}
+%      % To regenerate the figure, from directory results/mnist
+%      %python ../../../learn-topology/tools/plot_convergence.py fully-connected-cliques/all/2021-03-10-10:19:44-CET no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:26-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.'    'without clique avg.'  --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy.png
+%        \begin{subfigure}[b]{0.48\textwidth}
+%          \centering
+%          \includegraphics[width=\textwidth]{figures/d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy}
+%          \caption{\label{fig:d-cliques-mnist-init-clique-avg-effect-fcc-test-accuracy} D-Cliques (Fully-Connected), with Uniform Initialization}
+%      \end{subfigure}
+%      \quad
+%       % To regenerate the figure, from directory results/mnist
+%      %python ../../../learn-topology/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-12-11:12:01-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:49-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg.' 'without clique avg.' --legend 'lower right' --ymin 85 --ymax 92.5 --save-figure ../../figures/d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy.png
+%        \begin{subfigure}[b]{0.48\textwidth}
+%          \centering
+%          \includegraphics[width=\textwidth]{figures/d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy}
+%          \caption{\label{fig:d-cliques-mnist-no-init-clique-avg-effect-fcc-test-accuracy} D-Cliques (Fully-Connected), without Uniform Initialization}
+%      \end{subfigure}
      
      
-\caption{\label{fig:d-cliques-mnist-init-clique-avg-effect} MNIST: Effects of Clique Averaging and Uniform Initialization on Convergence Speed. (100 nodes, non-IID, D-Cliques, bsz=128)}
-\end{figure}
-
-Figure~\ref{fig:d-cliques-cifar10-init-clique-avg-effect} shows all the results for CIFAR10. One the one hand, with D-Cliques arranged in a ring, uniform initialization has a small but positive effect on convergence speed, whether Clique Averaging is used or not. With fully-connected D-Cliques,  the effect is significantly smaller and almost negligible, both with and without Clique Averaging. On the other hand, Clique Averaging is always beneficial, by a significantly larger margin for both interclique topologies and with and without uniform initialization. Moreover, the effect is stronger than for MNIST.
-
-    \begin{figure}[htbp]
-     \centering
-     % To regenerate the figure, from directory results/cifar10
-     % python ../../../learn-topology/tools/plot_convergence.py clique-ring/all/2021-03-10-11:58:43-CET no-init/clique-ring/all/2021-03-13-18:28:30-CET no-clique-avg/clique-ring/all/2021-03-13-18:27:09-CET  no-init-no-clique-avg/clique-ring/all/2021-03-13-18:29:58-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg., with uniform init.' 'with clique avg., without uniform init.'  'without clique avg., with uniform init.'   'without clique avg., without uniform init.' --legend 'upper left' --ymax 115  --save-figure ../../figures/d-cliques-cifar10-init-clique-avg-effect-ring-test-accuracy.png --font-size 15
-      \begin{subfigure}[b]{0.48\textwidth}
-         \centering
-         \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-init-clique-avg-effect-ring-test-accuracy}
-         \caption{\label{fig:d-cliques-cifar10-init-clique-avg-effect-ring-test-accuracy} D-Cliques (Ring)}
-     \end{subfigure}
-     % To regenerate the figure, from directory results/cifar10
-     %python ../../../learn-topology/tools/plot_convergence.py fully-connected-cliques/all/2021-03-10-13:58:57-CET no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET no-clique-avg/fully-connected-cliques/all/2021-03-13-18:31:36-CET  no-init-no-clique-avg/fully-connected-cliques/all/2021-03-13-18:34:35-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg., with uniform init.' 'with clique avg., without uniform init.'  'without clique avg., with uniform init.'   'without clique avg., without uniform init.' --legend 'upper left'  --ymax 115 --save-figure ../../figures/d-cliques-cifar10-init-clique-avg-effect-fcc-test-accuracy.png --font-size 15
-       \begin{subfigure}[b]{0.48\textwidth}
-         \centering
-         \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-init-clique-avg-effect-fcc-test-accuracy}
-         \caption{\label{fig:d-cliques-cifar10-init-clique-avg-effect-fcc-test-accuracy} D-Cliques (Fully-Connected)}
-     \end{subfigure}
-\caption{\label{fig:d-cliques-cifar10-init-clique-avg-effect} CIFAR10: Effects of Clique Averaging and Uniform Initialization on Convergence Speed. (100 nodes, non-IID, D-Cliques, bsz=20, momentum=0.9)}
-\end{figure}
+% \caption{\label{fig:d-cliques-mnist-init-clique-avg-effect} MNIST: Effects of Clique Averaging and Uniform Initialization on Convergence Speed. (100 nodes, non-IID, D-Cliques, bsz=128)}
+% \end{figure}
+
+% Figure~\ref{fig:d-cliques-cifar10-init-clique-avg-effect} shows all the results for CIFAR10. One the one hand, with D-Cliques arranged in a ring, uniform initialization has a small but positive effect on convergence speed, whether Clique Averaging is used or not. With fully-connected D-Cliques,  the effect is significantly smaller and almost negligible, both with and without Clique Averaging. On the other hand, Clique Averaging is always beneficial, by a significantly larger margin for both interclique topologies and with and without uniform initialization. Moreover, the effect is stronger than for MNIST.
+
+%     \begin{figure}[htbp]
+%      \centering
+%      % To regenerate the figure, from directory results/cifar10
+%      % python ../../../learn-topology/tools/plot_convergence.py clique-ring/all/2021-03-10-11:58:43-CET no-init/clique-ring/all/2021-03-13-18:28:30-CET no-clique-avg/clique-ring/all/2021-03-13-18:27:09-CET  no-init-no-clique-avg/clique-ring/all/2021-03-13-18:29:58-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg., with uniform init.' 'with clique avg., without uniform init.'  'without clique avg., with uniform init.'   'without clique avg., without uniform init.' --legend 'upper left' --ymax 115  --save-figure ../../figures/d-cliques-cifar10-init-clique-avg-effect-ring-test-accuracy.png --font-size 15
+%       \begin{subfigure}[b]{0.48\textwidth}
+%          \centering
+%          \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-init-clique-avg-effect-ring-test-accuracy}
+%          \caption{\label{fig:d-cliques-cifar10-init-clique-avg-effect-ring-test-accuracy} D-Cliques (Ring)}
+%      \end{subfigure}
+%      % To regenerate the figure, from directory results/cifar10
+%      %python ../../../learn-topology/tools/plot_convergence.py fully-connected-cliques/all/2021-03-10-13:58:57-CET no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET no-clique-avg/fully-connected-cliques/all/2021-03-13-18:31:36-CET  no-init-no-clique-avg/fully-connected-cliques/all/2021-03-13-18:34:35-CET --add-min-max --yaxis test-accuracy --labels 'with clique avg., with uniform init.' 'with clique avg., without uniform init.'  'without clique avg., with uniform init.'   'without clique avg., without uniform init.' --legend 'upper left'  --ymax 115 --save-figure ../../figures/d-cliques-cifar10-init-clique-avg-effect-fcc-test-accuracy.png --font-size 15
+%        \begin{subfigure}[b]{0.48\textwidth}
+%          \centering
+%          \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-init-clique-avg-effect-fcc-test-accuracy}
+%          \caption{\label{fig:d-cliques-cifar10-init-clique-avg-effect-fcc-test-accuracy} D-Cliques (Fully-Connected)}
+%      \end{subfigure}
+% \caption{\label{fig:d-cliques-cifar10-init-clique-avg-effect} CIFAR10: Effects of Clique Averaging and Uniform Initialization on Convergence Speed. (100 nodes, non-IID, D-Cliques, bsz=20, momentum=0.9)}
+% \end{figure}
 
-We conclude that Uniform Initialization is not so important for convergence speed but that Clique Averaging is always significantly so.
+% We conclude that Uniform Initialization is not so important for convergence speed but that Clique Averaging is always significantly so.
 
 % \subsection{Comparison to Non-Clustered Topologies}    
 %     
@@ -1231,10 +1235,12 @@ We conclude that Uniform Initialization is not so important for convergence spee
 %
 %
 
-\clearpage
+% \clearpage
+
+\section{Additional Experiments on Scaling Behavior with Increasing Number of
+Nodes}
+\label{app:scaling}
 
-\subsection{Scaling Behavior with Increasing Number of Nodes}
- 
 Section~\ref{section:interclique-topologies} compares the convergence speed of various inter-clique topologies at a scale of 1000 nodes. In this section, we show the effect of scaling the number of nodes, by comparing the convergence speed with 1, 10, 100, and 1000 nodes, and adjusting the batch size to maintain a constant number of updates per epoch. We present results for Ring, Fractal, Small-world, and Fully-Connected inter-clique topologies.
  
 Figure~\ref{fig:d-cliques-mnist-scaling-fully-connected} shows the results for