Skip to content
Snippets Groups Projects
Commit dd2e4564 authored by Erick Lavoie's avatar Erick Lavoie
Browse files

Added cifar10 smallworld results for 1000 nodes

parent 4516460a
No related branches found
No related tags found
No related merge requests found
figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png

53.3 KiB | W: | H:

figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png

65.5 KiB | W: | H:

figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
  • 2-up
  • Swipe
  • Onion skin
figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png

102 KiB | W: | H:

figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png

127 KiB | W: | H:

figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -149,8 +149,9 @@ To summarize, our contributions are as follows:
\begin{enumerate}
\item we show the significant impact of topology on convergence speed in the presence of non-IID data in decentralized learning;
\item we propose the D-Cliques topology to remove the impact of non-IID data on convergence speed, similar to a fully-connected topology. At a scale of 1000 nodes, this represents a 98\% reduction in the number of edges ($18.9$ vs $999$ edges per node on average) and a 96\% reduction in the total number of required messages;
\item we show how to leverage D-Cliques to implement momentum, a critical optimization technique to quickly train convolutional networks, that otherwise significantly \textit{decreases} convergence speed in the presence of non-IID data;
\item we show that, among the many possible choices of inter-clique topologies, a smallworld topology provides a convergence speed close to fully-connecting all cliques pairwise, but requires only $O(n + log(n))$ instead of $O(n^2)$ edges where $n$ is the number of nodes. At a scale of 1000 nodes, this represents a 22\% reduction in the number of edges compared to fully-connecting cliques ($14.6$ vs $18.9$ edges per node on average) and suggests possible bigger gains at larger scales.
\item we show how to leverage cliques to: (1) remove gradient bias that originate from inter-clique edges;
(2) implement momentum, a critical optimization technique to quickly train convolutional networks, that otherwise significantly \textit{decreases} convergence speed in the presence of non-IID data;
\item we show that, among the many possible choices of inter-clique topologies, a smallworld topology provides a convergence speed close to fully-connecting all cliques pairwise, but requires only $O(n + log(n))$ instead of $O(n^2)$ edges where $n$ is the number of nodes. At a scale of 1000 nodes, this represents a further 22\% reduction in the number of edges compared to fully-connecting cliques ($14.6$ vs $18.9$ edges per node on average) and suggests possible bigger gains at larger scales.
\end{enumerate}
The rest of this paper is organized as follows. \aurelien{TO COMPLETE}
......@@ -526,7 +527,7 @@ In addition, it is important that all nodes are initialized with the same model
\begin{figure}[htbp]
\centering
% To regenerate the figure, from directory results/cifar10
% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET --add-min-max --yaxis training-loss --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'upper right' --ymax 3 --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET --add-min-max --yaxis training-loss --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'upper right' --ymax 3 --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-training-loss.png
\begin{subfigure}[b]{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-1000-vs-1-node-training-loss}
......@@ -534,7 +535,7 @@ In addition, it is important that all nodes are initialized with the same model
\end{subfigure}
\hfill
% To regenerate the figure, from directory results/cifar10
% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
% python ../../../learn-topology/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy.png
\begin{subfigure}[b]{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment