diff --git a/figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png b/figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
index f022706391c79fa857de417027a35fbfbb52b8a0..9bd7f7be0d0325d76e3f5f80b0af6a0bd14764ca 100644
Binary files a/figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png and b/figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png differ
diff --git a/figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png b/figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
index 111ecf77c48c0b3c03bb20df468adf00edb13b46..f82ce9047d46dbd27ff3def129bd7a9ab272ef13 100644
Binary files a/figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png and b/figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png differ
diff --git a/main.tex b/main.tex
index 198fbf779a0e765f122e2b5addbeeca184da7e60..bebaca5ee6ed9a049dfe80494230bef01f5c2c51 100644
--- a/main.tex
+++ b/main.tex
@@ -149,8 +149,9 @@ To summarize, our contributions are as follows:
 \begin{enumerate}
   \item we show the significant impact of topology on convergence speed in the presence of non-IID data in decentralized learning;
   \item we propose the D-Cliques topology to remove the impact of non-IID data on convergence speed, similar to a fully-connected topology. At a scale of 1000 nodes, this represents a 98\% reduction in the number of edges ($18.9$ vs $999$ edges per node on average) and a 96\% reduction in the total number of required messages;
-  \item we show how to leverage D-Cliques to implement momentum, a critical optimization technique to quickly train convolutional networks, that otherwise significantly \textit{decreases} convergence speed in the presence of non-IID data;
-  \item we show that, among the many possible choices of inter-clique topologies, a smallworld topology provides a convergence speed close to fully-connecting all cliques pairwise, but requires only $O(n + log(n))$ instead of $O(n^2)$ edges where $n$ is the number of nodes. At a scale of 1000 nodes, this represents a 22\% reduction in the number of edges compared to fully-connecting cliques ($14.6$ vs $18.9$ edges per node on average) and suggests possible bigger gains at larger scales. 
+  \item we show how to leverage cliques to: (1) remove gradient bias that originate from inter-clique edges; 
+   (2) implement momentum, a critical optimization technique to quickly train convolutional networks, that otherwise significantly \textit{decreases} convergence speed in the presence of non-IID data;
+  \item we show that, among the many possible choices of inter-clique topologies, a smallworld topology provides a convergence speed close to fully-connecting all cliques pairwise, but requires only $O(n + log(n))$ instead of $O(n^2)$ edges where $n$ is the number of nodes. At a scale of 1000 nodes, this represents a further 22\% reduction in the number of edges compared to fully-connecting cliques ($14.6$ vs $18.9$ edges per node on average) and suggests possible bigger gains at larger scales. 
 \end{enumerate}
 
 The rest of this paper is organized as follows. \aurelien{TO COMPLETE}
@@ -526,7 +527,7 @@ In addition, it is important that all nodes are initialized with the same model
 \begin{figure}[htbp]
      \centering
           % To regenerate the figure, from directory results/cifar10
-% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis training-loss --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'upper right'  --ymax 3 --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
+% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis training-loss --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)'  'd-cliques (fractal)'   'd-cliques (ring)' --legend 'upper right'  --ymax 3 --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
      \begin{subfigure}[b]{0.48\textwidth}
          \centering
          \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-1000-vs-1-node-training-loss}
@@ -534,7 +535,7 @@ In addition, it is important that all nodes are initialized with the same model
      \end{subfigure}
      \hfill
      % To regenerate the figure, from directory results/cifar10
-% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
+% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET  ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
      \begin{subfigure}[b]{0.48\textwidth}
          \centering
          \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy}