pass on appendix B2

2be9a4e9 · aurelien.bellet · 7a46d0a5 · 2be9a4e9
Commit 2be9a4e9 authored 3 years ago by aurelien.bellet
--- a/main.tex
+++ b/main.tex
@@ -1231,16 +1231,39 @@ We conclude that Uniform Initialization is not so important for convergence spee

 \clearpage

-\subsection{Scaling behaviour as the number of nodes increases}
+\subsection{Scaling Behavior with Increasing Number of Nodes}
 
- Section~\ref{section:interclique-topologies} compares the convergence speed of various interclique topologies at a scale of 1000 nodes. In this section, we show the effect of scaling the number of nodes, by comparing the convergence speed with 1, 10, 100, and 1000 nodes, and adjusting the batch size to maintain a constant number of updates per epoch. We present results for Ring, Fractal, Smallworld, and Fully-Connected Cliques interclique topologies.
+Section~\ref{section:interclique-topologies} compares the convergence speed of various inter-clique topologies at a scale of 1000 nodes. In this section, we show the effect of scaling the number of nodes, by comparing the convergence speed with 1, 10, 100, and 1000 nodes, and adjusting the batch size to maintain a constant number of updates per epoch. We present results for Ring, Fractal, Small-world, and Fully-Connected inter-clique topologies.
 
-Figure~\ref{fig:d-cliques-mnist-scaling-fully-connected} shows results for MNIST. For all topologies, we notice a perfect scaling up to 100 nodes, i.e. the accuracy curves overlap, with low variance between nodes. Starting at 1000 nodes, there is a significant increase in variance between nodes and the convergence is slower, only marginally for Fully-Connected Cliques but signifiantly so for Fractal and Ring. Smallworld has higher variance between nodes but has a convergence speed close to that of Fully-Connected Cliques.
+Figure~\ref{fig:d-cliques-mnist-scaling-fully-connected} shows the results for
+MNIST. For all topologies, we notice a perfect scaling up to 100 nodes, i.e.
+the accuracy curves overlap, with low variance between nodes. Starting at 1000
+nodes, there is a significant increase in variance between nodes and the
+convergence is slower, only marginally for Fully-Connected but
+significantly so for Fractal and Ring. Small-world has higher variance between nodes but maintains a convergence speed close to that of Fully-Connected.
+
+Figure~\ref{fig:d-cliques-cifar10-scaling-fully-connected} shows the results
+for CIFAR10. When increasing from 1 to 10 nodes (resulting in a single
+fully-connected clique), there is actually a small increase both in final
+accuracy and convergence speed. We believe this increase is due to the
+gradient being computed with exactly the same number of examples from all
+classes with 10 fully-connected non-IID nodes, while the gradient for a single
+non-IID node may have a slightly larger bias because the random sampling does
+not guarantee the representation of all classes perfectly in each batch. At a
+scale of 100 nodes, there is no difference between Fully-Connected and
+Fractal, as the connections are the same; however, a Ring already shows a
+significantly slower convergence. At 1000 nodes, the convergence significantly
+slows down for Fractal and Ring, while remaining close, albeit with a larger
+variance, for Fully-Connected. Similar to MNIST, Small-world has
+higher variance and slightly lower convergence speed than Fully-Connected but
+remains very close.
+
+We therefore conclude that Fully-Connected and Small-world have good scaling
+properties in terms of convergence speed, and that the
+linear-logarithmic number of edges of Small-world makes it the best compromise
+between convergence speed and connectivity, and thus the best choice for
+efficient large-scale decentralized learning in practice.

-Figure~\ref{fig:d-cliques-cifar10-scaling-fully-connected} shows results for CIFAR10. When increasing from 1 to 10 nodes, which results in a single fully-connected clique, there is actually a small increase both in final accuracy and convergence speed. We believe this increase is due to the gradient being computed with exactly the same number of examples for all classes with 10 fully-connected non-IID nodes, while the gradient for a single non-IID node may have a slightly bigger bias because the random sampling does not guarantee the representation of all classes exactly equally.  At a scale of 100 nodes, there is no difference between Fully-Connected Cliques and Fractal, as the connections are the same; however, a Ring already shows a significantly slower convergence. At 1000 nodes, the convergence significantly slows for Fractal and Ring, while remaining close, albeit with a larger variance, for Fully-Connected Cliques. Similar to MNIST, Smallworld has higher variance and lower convergence speed than Fully-Connected Topology but remains close.
-
-We therefore conclude that Fully-Connected Cliques and Smallworld have good scaling properties in terms of convergence speed, and that Smallworld, with its linear-logarithmic scaling, is therefore a good compromise between convergence speed and number of edges required. 
- 
 \begin{figure}[htbp]
         \centering     
              % To regenerate the figure, from directory results/scaling
@@ -1248,7 +1271,7 @@ We therefore conclude that Fully-Connected Cliques and Smallworld have good scal
      \begin{subfigure}[b]{0.48\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figures/d-cliques-mnist-scaling-fully-connected-cst-updates}
-         \caption{Fully-Connected Cliques}
+         \caption{Fully-Connected}
     \end{subfigure}
     \quad
       % To regenerate the figure, from directory results/scaling
@@ -1256,7 +1279,7 @@ We therefore conclude that Fully-Connected Cliques and Smallworld have good scal
      \begin{subfigure}[b]{0.48\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figures/d-cliques-mnist-scaling-smallworld-cst-updates}
-         \caption{Smallworld}
+         \caption{Small-world}
     \end{subfigure}
     \quad

@@ -1276,7 +1299,9 @@ We therefore conclude that Fully-Connected Cliques and Smallworld have good scal
         \caption{Ring}
     \end{subfigure}  
     
-     \caption{\label{fig:d-cliques-mnist-scaling-fully-connected} MNIST: D-Clique Scaling Behaviour (Constant Updates per Epoch)}
+     \caption{\label{fig:d-cliques-mnist-scaling-fully-connected} MNIST:
+     D-Cliques scaling behavior (constant updates per epoch) for different
+     inter-clique topologies.}
 \end{figure}
     
 \begin{figure}[htbp]
@@ -1287,14 +1312,14 @@ We therefore conclude that Fully-Connected Cliques and Smallworld have good scal
      \begin{subfigure}[b]{0.48\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-scaling-fully-connected-cst-updates}
-         \caption{Fully-Connected Cliques}
+         \caption{Fully-Connected}
     \end{subfigure}
     \quad
     % python ../../../learn-topology/tools/plot_convergence.py ../cifar10/1-node-iid/all/2021-03-10-13:52:58-CET 10/cifar10/fully-connected-cliques/all/2021-03-13-19:06:02-CET ../cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:23-CET 1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET --labels '1 node IID bsz=2000' '10 nodes non-IID bsz=200' '100 nodes non-IID bsz=20' '1000 nodes non-IID bsz=2' --legend 'lower right' --yaxis test-accuracy --save-figure ../../figures/d-cliques-cifar10-scaling-smallworld-cst-updates.png --add-min-max
      \begin{subfigure}[b]{0.48\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figures/d-cliques-cifar10-scaling-smallworld-cst-updates}
-         \caption{Smallworld}
+         \caption{Small-world}
     \end{subfigure}
     
     
@@ -1314,7 +1339,8 @@ We therefore conclude that Fully-Connected Cliques and Smallworld have good scal
         \caption{Ring}
     \end{subfigure}  
     
-     \caption{\label{fig:d-cliques-cifar10-scaling-fully-connected} CIFAR10: D-Clique Scaling Behaviour (Constant Updates per Epoch)}
+     \caption{\label{fig:d-cliques-cifar10-scaling-fully-connected} CIFAR10: D-Cliques scaling behavior (constant updates per epoch) for different
+     inter-clique topologies.}
 \end{figure}