Added cifar10 smallworld results for 1000 nodes

dd2e4564 · Erick Lavoie · 4516460a · 4516460a · dd2e4564 · 4516460a
Commit dd2e4564 authored 4 years ago by Erick Lavoie
--- a/figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
+++ b/figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
--- a/figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
+++ b/figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
--- a/main.tex
+++ b/main.tex
@@ -149,8 +149,9 @@ To summarize, our contributions are as follows:
 \begin{enumerate}
  \item we show the significant impact of topology on convergence speed in the presence of non-IID data in decentralized learning;
  \item we propose the D-Cliques topology to remove the impact of non-IID data on convergence speed, similar to a fully-connected topology. At a scale of 1000 nodes, this represents a 98\% reduction in the number of edges ($18.9$ vs $999$ edges per node on average) and a 96\% reduction in the total number of required messages;
-  \item we show how to leverage D-Cliques to implement momentum, a critical optimization technique to quickly train convolutional networks, that otherwise significantly \textit{decreases} convergence speed in the presence of non-IID data;
-  \item we show that, among the many possible choices of inter-clique topologies, a smallworld topology provides a convergence speed close to fully-connecting all cliques pairwise, but requires only $O(n + log(n))$ instead of $O(n^2)$ edges where $n$ is the number of nodes. At a scale of 1000 nodes, this represents a 22\% reduction in the number of edges compared to fully-connecting cliques ($14.6$ vs $18.9$ edges per node on average) and suggests possible bigger gains at larger scales. 
+  \item we show how to leverage cliques to: (1) remove gradient bias that originate from inter-clique edges; 
+   (2) implement momentum, a critical optimization technique to quickly train convolutional networks, that otherwise significantly \textit{decreases} convergence speed in the presence of non-IID data;
+  \item we show that, among the many possible choices of inter-clique topologies, a smallworld topology provides a convergence speed close to fully-connecting all cliques pairwise, but requires only $O(n + log(n))$ instead of $O(n^2)$ edges where $n$ is the number of nodes. At a scale of 1000 nodes, this represents a further 22\% reduction in the number of edges compared to fully-connecting cliques ($14.6$ vs $18.9$ edges per node on average) and suggests possible bigger gains at larger scales. 
 \end{enumerate}

 The rest of this paper is organized as follows. \aurelien{TO COMPLETE}
@@ -526,7 +527,7 @@ In addition, it is important that all nodes are initialized with the same model
 \begin{figure}[htbp]
     \centering
          % To regenerate the figure, from directory results/cifar10
-% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis training-loss --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'upper right'  --ymax 3 --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
+% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis training-loss --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)'  'd-cliques (fractal)'   'd-cliques (ring)' --legend 'upper right'  --ymax 3 --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
     \begin{subfigure}[b]{0.48\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-1000-vs-1-node-training-loss}
@@ -534,7 +535,7 @@ In addition, it is important that all nodes are initialized with the same model
     \end{subfigure}
     \hfill
     % To regenerate the figure, from directory results/cifar10
-% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
+% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET  ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
     \begin{subfigure}[b]{0.48\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy}