exp.tex

% !TEX root = main.tex

\section{Experiments}
\label{section:non-clustered}

In this section, we first compare D-Cliques to alternative topologies to
confirm the relevance of our main design choices. Then,
we evaluate some extensions of D-Cliques to further reduce the number of
inter-clique connections so as to gracefully scale with the number of
nodes.

\subsection{Methodology}

\subsubsection{Non-IID assumptions.}
\label{section:non-iid-assumptions}

As demonstrated in Figure~\ref{fig:iid-vs-non-iid-problem}, lifting the
assumption of IID data significantly challenges the learning algorithm. In
this paper, we focus on an \textit{extreme case of label distribution skew}: we
consider that each node only has examples from a single class.

To isolate the effect of label distribution skew from other potentially compounding
factors, we make the following simplifying assumptions: (1) All classes are
equally represented in the global dataset; (2) All classes are represented on
the same number of nodes; (3) All nodes have the same number of examples.

We believe that these assumptions are reasonable in the context of our study
because: (1)
Global class
imbalance equally
affects the optimization process on a single node and is therefore not
specific to the decentralized setting; (2) Our results do not exploit specific
positions in the topology;  (3) Imbalanced dataset sizes across nodes can be
addressed for instance by appropriately weighting the individual loss
functions. Our results can be extended to support additional compounding factors in future work.

\subsubsection{Experimental setup.}
\label{section:experimental-settings}

Our main goal is to provide a fair comparison of the convergence speed across
different topologies and algorithmic variations, in order to
show that our approach
can remove much of the effect of label distribution skew.

We experiment with two datasets: MNIST~\cite{mnistWebsite} and
CIFAR10~\cite{krizhevsky2009learning}, which both have $L=10$ classes.
For MNIST, we use 45k and 10k examples from the original 60k
training set for training and validation respectively. The remaining 5k
training examples were randomly removed to ensure all 10 classes are balanced
while ensuring that the dataset is evenly divisible across 100 and 1000 nodes.
We use all 10k examples of
the test set to measure prediction accuracy. For CIFAR10, classes are evenly
balanced: we use 45k/50k images of the original training set for training,
5k/50k for validation, and all 10k examples of the test set for measuring
prediction accuracy.

We
use a logistic regression classifier for MNIST, which
provides up to 92.5\% accuracy in the centralized setting.
For CIFAR10, we use a Group-Normalized variant of LeNet~\cite{quagmire}, a
deep convolutional network which achieves an accuracy of $72.3\%$ in the
centralized setting.
These models are thus reasonably accurate (which is sufficient to
study the effect of the topology) while being sufficiently fast to train in a
fully decentralized setting and simple enough to configure and analyze.
Regarding hyper-parameters, we jointly optimize the learning rate and
mini-batch size on the
validation set for 100 nodes, obtaining respectively $0.1$ and $128$ for
MNIST and $0.002$ and $20$ for CIFAR10.
For CIFAR10, we additionally use a momentum of $0.9$.

We evaluate 100- and 1000-node networks by creating multiple models in memory and simulating the exchange of messages between nodes.
To ignore the impact of distributed execution strategies and system
optimization techniques, we report the test accuracy of all nodes (min, max,
average) as a function of the number of times each example of the dataset has
been sampled by a node, i.e. an \textit{epoch}. This is equivalent to the classic case of a single node sampling the full distribution.
To further make results comparable across different number of nodes, we lower
the batch size proportionally to the number of nodes added, and inversely,
e.g. on MNIST, 128 with 100 nodes vs. 13 with 1000 nodes. This
ensures the same number of model updates and averaging per epoch, which is
important to have a fair comparison.\footnote{Updating and averaging models
after every example can eliminate the impact of label distribution skew. However, the
resulting communication overhead is impractical.}

Finally, we compare our results against an ideal baseline: either a
fully-connected network topology with the same number of nodes or a single IID
node. In both cases, the topology has no effect on
the optimization. For a certain choice of number of nodes and
mini-batch size, both approaches are equivalent. 

\subsection{Main Results}

\begin{figure}[t]
    \centering 
             
    \begin{subfigure}[b]{0.20\textwidth}
    \centering
    \includegraphics[width=\textwidth]{../figures/fully-connected-cliques}
    \caption{\label{fig:d-cliques-figure} D-Cliques (fully-connected
    cliques)}
    \end{subfigure}
    \hfill
    % To regenerate figure, from results/mnist
    % python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py fully-connected/all/2021-03-10-09:25:19-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:49-CET --add-min-max --yaxis test-accuracy --ymin 80 --ymax 92.5 --labels '100 nodes non-IID fully-connected' '100 nodes non-IID d-cliques' --save-figure ../../figures/d-cliques-mnist-vs-fully-connected.png --legend 'lower right' --font-size 16 --linestyles 'solid' 'dashed'
    \begin{subfigure}[b]{0.26\textwidth}
    \centering
    \includegraphics[width=\textwidth]{../figures/d-cliques-mnist-vs-fully-connected.png}
    \caption{\label{fig:d-cliques-example-convergence-speed} Convergence Speed
    on MNIST}
    \end{subfigure}
    
\caption{\label{fig:d-cliques-example} D-Cliques topology and convergence
speed on MNIST.}
\end{figure}

Figure~\ref{fig:d-cliques-example-convergence-speed} illustrates the
performance of D-Cliques on MNIST with $n=100$ nodes. Observe that the
convergence speed is
very close
to that of a fully-connected topology, and significantly better than with
a ring or a grid (see Figure~\ref{fig:iid-vs-non-iid-problem}). With 
100 nodes, it offers a reduction of $\approx90\%$ in the number of edges
compared to a fully-connected topology. Nonetheless, there is still
significant variance in the accuracy across nodes, which is due to the bias
introduced by inter-clique edges. We address this issue in the next section.

\subsection{Importance of Clique Averaging and Momentum}

% To regenerate figure, from results/mnist:
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py fully-connected/all/2021-03-10-09:25:19-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:49-CET  no-init/fully-connected-cliques/all/2021-03-12-11:12:01-CET --add-min-max --yaxis test-accuracy --labels '100 nodes non-IID fully-connected' '100 nodes non-IID d-cliques w/o clique avg.' '100 nodes d-cliques non-IID w/ clique avg.' --legend 'lower right' --ymin 89 --ymax 92.5 --font-size 13 --save-figure ../../figures/d-clique-mnist-clique-avg.png --linestyles 'solid' 'dashed' 'dotted'
\begin{figure}[t]
         \centering
         \includegraphics[width=0.35\textwidth]{../figures/d-clique-mnist-clique-avg}
\caption{\label{fig:d-clique-mnist-clique-avg} Effect of Clique Averaging on MNIST. Y-axis starts at 89.}
\end{figure}

As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, Clique Averaging
significantly reduces the variance of models across nodes and accelerates
convergence to reach the same level as the one obtained with a
fully-connected topology. Note that Clique Averaging induces a small
additional cost, as gradients
and models need to be sent in two separate rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$.

\begin{figure}[t]
    \centering 
    % To regenerate figure, from results/cifar10
    % python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET  no-init-no-clique-avg/fully-connected-cliques/all/2021-03-13-18:34:35-CET no-init-no-clique-avg-no-momentum/fully-connected-cliques/all/2021-03-26-13:47:35-CET/ --legend 'upper right' --add-min-max --labels '1-node IID w/ momentum'  '100 nodes non-IID d-cliques w/ momentum' '100 nodes non-IID d-cliques w/o momentum'  --font-size 14 --yaxis test-accuracy --save-figure ../../figures/d-cliques-cifar10-momentum-non-iid-effect.png --ymax 100 --linestyles 'solid' 'dashed' 'dotted'         
    \begin{subfigure}[b]{0.35\textwidth}
    \centering
    \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-momentum-non-iid-effect}
    \caption{\label{fig:d-cliques-cifar10-momentum-non-iid-effect} Without Clique Averaging }
    \end{subfigure}
    \hfill
    % To regenerate figure, from results/cifar10
    % python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET --legend 'upper right' --add-min-max --labels '1-node IID w/ momentum' '100 nodes non-IID d-clique w/ momentum' --font-size 14 --yaxis test-accuracy --save-figure ../../figures/d-cliques-cifar10-momentum-non-iid-clique-avg-effect.png --ymax 100 --linestyles 'solid' 'dashed' 'dotted' 
    \begin{subfigure}[b]{0.35\textwidth}
    \centering
    \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-momentum-non-iid-clique-avg-effect}
    \caption{\label{fig:d-cliques-cifar10-momentum-non-iid-clique-avg-effect} With Clique Averaging}
    \end{subfigure}
\caption{\label{fig:cifar10-momentum} Non-IID Effect of Momentum on CIFAR10 with LeNet}
\end{figure}

As shown in
Figure~\ref{fig:d-cliques-cifar10-momentum-non-iid-clique-avg-effect}, 
the use of Clique Averaging restores the benefits of momentum and closes the gap
with the centralized setting.

\subsection{Comparing D-Cliques to Other Sparse Topologies}

We demonstrate the advantages of D-Cliques over alternative sparse topologies
that have a similar number of edges. First, we consider topologies in which
the neighbors of each node are selected at random (hence without any clique
structure).
Specifically, for $n=100$ nodes, we
construct a random topology such that each node has exactly 10 edges, which is
similar to the average 9.9 edges of our D-Cliques topology 
(Figure~\ref{fig:d-cliques-figure}). To better understand the role of
the clique structure beyond merely ensuring class representativity among
neighbors,
we also compare to a random topology similar to the one described above except
that edges are
chosen such that each node has neighbors of all possible classes. Finally, we
also implement an analog of Clique Averaging for these random topologies,
where all nodes de-bias their gradient based on the class distribution of
their neighbors. In the latter case, since nodes do not form a clique, each
node obtains a different average gradient.

The results for MNIST and CIFAR10 are shown in
Figure~\ref{fig:d-cliques-comparison-to-non-clustered-topologies}. For MNIST,
a purely random topology has higher variance and lower convergence speed than
D-Cliques (with or without Clique Averaging), while a random topology with
class representativity performs similarly as D-Cliques without Clique
Averaging. However and perhaps surprisingly, a random topology with unbiased
gradient performs slightly worse than without it. In any case, D-Cliques with
Clique Averaging outperforms all random topologies, showing that the clique
structure has a small but noticeable effect on the average accuracy and
significantly reduces the variance across nodes in this setup.

\begin{figure}[t]
     \centering     
         \begin{subfigure}[b]{0.35\textwidth}
% To regenerate the figure, from directory results/mnist
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py fully-connected-cliques/all/2021-03-10-10:19:44-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:49-CET  random-10/all/2021-07-23-11:59:56-CEST  random-10-diverse/all/2021-03-17-20:28:35-CET --labels 'd-clique (fcc)' 'd-clique (fcc) no clique avg.' '10 random edges' '10 random edges (all classes represented)' --add-min-max --legend 'lower right' --ymin 80 --ymax 92.5 --yaxis test-accuracy --save-figure ../../figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png --font-size 13 --linestyles 'solid' 'dashed' 'dotted' 'dashdot'
         \centering
         \includegraphics[width=\textwidth]{../figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies}
                  \caption{MNIST with Linear Model}
         \end{subfigure}
                 \hfill                      
% To regenerate the figure, from directory results/cifar10
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-13-18:34:35-CET random-10/all/2021-07-23-14:33:48-CEST  random-10-diverse/all/2021-03-17-20:30:41-CET random-10-diverse-unbiased-gradient/all/2021-03-17-20:31:14-CET --labels 'd-clique (fcc) clique avg.' 'd-clique (fcc) no clique avg.' '10 random edges' '10 random edges (all classes repr.)' '10 random (all classes repr.) with unbiased grad.' --add-min-max --legend 'upper left' --yaxis test-accuracy --save-figure ../../figures/d-cliques-cifar10-linear-comparison-to-non-clustered-topologies.png --ymax 119 --font-size 13  --linestyles 'solid' 'dashed' 'dotted' 'dashdot' 'solid' --markers '' '' '' '' 'o'
        \begin{subfigure}[b]{0.35\textwidth}
        \centering
         \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-linear-comparison-to-non-clustered-topologies}
         \caption{CIFAR10 with LeNet}
     \end{subfigure} 
 \caption{\label{fig:d-cliques-comparison-to-non-clustered-topologies} Comparison to Non-Clustered Topologies} 
\end{figure}

On the harder CIFAR10 dataset with a deep convolutional network, the
differences are much more dramatic:
D-Cliques with Clique Averaging and momentum turns out to be critical for fast
convergence.
Crucially, all random topologies fail to converge to a good solution. This
confirms that our clique structure is important to reduce variance
across nodes and improve the convergence. The difference with the previous
experiment seems to be due to both the use of a higher capacity model and to
the intrinsic characteristics of the datasets.

While the previous experiments suggest that our clique structure is
instrumental in obtaining good performance, one may wonder whether
intra-clique full connectivity is actually necessary.
Figure~\ref{fig:d-cliques-intra-connectivity} shows the convergence speed of
a D-Cliques topology where cliques have been sparsified by randomly
removing 1 or 5 undirected edges per clique (out of 45). Strikingly, both for MNIST and
CIFAR10, removing just a single edge from the cliques has a
significant effect on the
convergence speed. On CIFAR10, it even entirely negates the
benefits of D-Cliques.

Overall, these results show that achieving fast convergence on non-IID
data with sparse topologies requires a very careful design, as we have
proposed with D-Cliques.

\begin{figure*}[t]
     \centering

\begin{subfigure}[htbp]{0.4\textwidth}
     \centering   
% To regenerate the figure, from directory results/mnist
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-12-11:12:01-CET rm-1-edge/all/2021-03-18-17:28:27-CET rm-1-edge-unbiased-grad/all/2021-03-18-17:28:47-CET --add-min-max --ymin 85 --ymax 92.5 --legend 'lower right' --yaxis test-accuracy --labels 'fcc, clique grad.' 'fcc -1 edge/clique, no clique grad.' 'fcc -1 edge/clique, clique grad.' --save-figure ../../figures/d-cliques-mnist-clique-clustering-fcc-minus-1-edge.png  --font-size 13  --linestyle 'solid' 'dashed' 'dotted' 
         \includegraphics[width=\textwidth]{../figures/d-cliques-mnist-clique-clustering-fcc-minus-1-edge}     
\caption{\label{fig:d-cliques-mnist-clique-clustering-minus-1-edge} MNIST (-1 edge/clique)}
\end{subfigure}
\hfill
\begin{subfigure}[htbp]{0.4\textwidth}
     \centering
% To regenerate the figure, from directory results/cifar10
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET rm-1-edge/all/2021-03-18-17:29:58-CET rm-1-edge-unbiased-grad/all/2021-03-18-17:30:17-CET --add-min-max --ymax 80 --legend 'upper left' --yaxis test-accuracy --labels 'fcc, clique grad.' 'fcc -1 edge/clique, no clique grad.' 'fcc -1 edge/clique, clique grad.' --save-figure ../../figures/d-cliques-cifar10-clique-clustering-fcc-minus-1-edge.png --font-size 13 --linestyle 'solid' 'dashed' 'dotted'
         \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-clique-clustering-fcc-minus-1-edge}
\caption{\label{fig:d-cliques-cifar10-clique-clustering-minus-1-edge} CIFAR10 (-1 edge/clique)}
\end{subfigure}

%\begin{subfigure}[htbp]{0.35\textwidth}
%     \centering  
%% To regenerate the figure, from directory results/mnist
%% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-12-11:12:01-CET rm-5-edges/all/2021-03-18-17:29:10-CET rm-5-edges-unbiased-grad/all/2021-03-18-17:29:36-CET --add-min-max --ymin 85 --ymax 92.5 --legend 'lower right' --yaxis test-accuracy --labels 'fcc, clique grad.' 'fcc -5 edges/clique, no clique grad.' 'fcc -5 edges/clique, clique grad.' --save-figure ../../figures/d-cliques-mnist-clique-clustering-fcc-minus-5-edges.png  --font-size 13 --linestyle 'solid' 'dashed' 'dotted'   
%         \includegraphics[width=\textwidth]{../figures/d-cliques-mnist-clique-clustering-fcc-minus-5-edges}     
%\caption{\label{fig:d-cliques-mnist-clique-clustering-minus-5-edges} MNIST (-5 edges/clique)}
%\end{subfigure}
%\hfill
%\begin{subfigure}[htbp]{0.35\textwidth}
%     \centering
%% To regenerate the figure, from directory results/cifar10
%% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET rm-5-edges/all/2021-03-18-17:30:38-CET rm-5-edges-unbiased-grad/all/2021-03-18-17:31:04-CET --add-min-max --ymax 80 --legend 'upper left' --yaxis test-accuracy --labels 'fcc, clique grad.' 'fcc -5 edges/clique, no clique grad.'  'fcc -5 edges/clique, clique grad.' --save-figure ../../figures/d-cliques-cifar10-clique-clustering-fcc-minus-5-edges.png --font-size 13 --linestyle 'solid' 'dashed' 'dotted'
%         \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-clique-clustering-fcc-minus-5-edges}
%\caption{\label{fig:d-cliques-cifar10-clique-clustering-minus-5-edges} CIFAR10 (-5 edges/clique)}
%\end{subfigure}

\caption{\label{fig:d-cliques-intra-connectivity} Importance of Intra-Clique Full-Connectivity}
\end{figure*}

\subsection{Scaling up D-Cliques with Sparser Inter-Clique Topologies}
\label{section:interclique-topologies}


So far, we have used a fully-connected inter-clique topology for D-Cliques,
which has the advantage of bounding the
\textit{path length}\footnote{The \textit{path length} is the number of edges on the path with the shortest number of edges between two nodes.} to $3$ between any pair of nodes. This choice requires $
\frac{n}{L}(\frac{n}{L} - 1)$ inter-clique edges, which scales quadratically
in the number of nodes $n$ for a given clique size $L$\footnote{We consider 
\textit{directed} edges in the analysis: the number of undirected edges is half and does not affect asymptotic behavior.}. This can become significant at larger scales when $n$ is
large compared to $L$.

In this last series of experiments, we evaluate the effect of choosing sparser
inter-clique topologies on the convergence speed for a larger network of 1000
nodes. We compare the scalability and convergence speed of several
D-Cliques variants, which all use $O(nL)$ edges
to create cliques as a starting point.

We first measure the convergence speed of inter-cliques topologies whose number of edges scales linearly with the number of nodes. Among those, the \textit{ring} has the (almost) fewest possible number of edges: it
uses $\frac{2n}{L}$ inter-clique edges but its average path length between
nodes 
also scales linearly.
We also consider another topology, which we call \textit{fractal}, that provides a
logarithmic
bound on the average path length. In this hierarchical scheme, 
cliques are assembled in larger groups of $L$ cliques that are connected
internally with one edge per
pair of cliques, but with only one edge between pairs of larger groups. The
topology is built recursively such that $L$ groups will themselves form a
larger group at the next level up. This results in at most $L$ edges per node 
if edges are evenly distributed: i.e., each group within the same level adds 
at most $c-1$ edges to other groups, leaving one node per group with $L-1$ 
edges that can receive an additional edge to connect with other groups at the next level.
Since nodes have at most $L$ edges, $n$ nodes have at most $nL$ edges,
therefore
the number of edges in this fractal scheme indeed scales linearly in the number of nodes.

Second, we look at another scheme 
in which the number of edges scales in a near, but not quite, linear fashion.
We propose to connect cliques according to a
small-world-like topology~\cite{watts2000small} applied on top of a
ring~\cite{stoica2003chord}. In this scheme, cliques are first arranged in a
ring. Then each clique adds symmetric edges, both clockwise and
counter-clockwise on the ring, with the $m$ closest cliques in sets of
cliques that are exponentially bigger the further they are on the ring (see
Algorithm~\ref{Algorithm:Smallworld} in the appendix for
details on the construction). This ensures a good connectivity with other
cliques that are close on the ring, while still keeping the average
path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n\log(n))$ with the number of nodes.

Figure~\ref{fig:d-cliques-cifar10-convolutional} shows the convergence
speed of all the above schemes on MNIST and CIFAR10, compared to the ideal
baseline
of a
single IID node performing the same number of updates per epoch (representing
the fastest convergence speed achievable if topology had no impact). Among the linear schemes, the ring
topology converges but is much slower than our fractal scheme. Among the super-linear schemes, the small-world
topology has a convergence speed that is almost the same as with a
fully-connected inter-clique topology but with 22\% less edges
(14.5 edges on average instead of 18.9). 

While the small-world inter-clique topology shows promising scaling behaviour, the
fully-connected topology still offers
significant benefits with 1000 nodes, as it represents a 98\% reduction in the
number of edges compared to fully connecting individual nodes (18.9 edges on
average instead of 999) and a 96\% reduction in the number of messages (37.8
messages per round per node on average instead of 999). We refer to
Appendix~\ref{app:scaling} for additional results comparing the convergence
speed across different number of nodes. Overall, these results
show that D-Cliques can nicely scale with the number of nodes.

\begin{figure*}[t]
     \centering
       % To regenerate the figure, from directory results/mnist
 % python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py 1-node-iid/all/2021-03-10-09:20:03-CET ../scaling/1000/mnist/fractal-cliques/all/2021-03-14-17:41:59-CET ../scaling/1000/mnist/clique-ring/all/2021-03-13-18:22:36-CET     --add-min-max --yaxis test-accuracy --legend 'lower right' --ymin 84 --ymax 92.5 --labels '1 node IID' 'd-cliques (fractal)' 'd-cliques (ring)'  --save-figure ../../figures/d-cliques-mnist-1000-nodes-comparison-linear.png --font-size 13 --linestyles 'solid' 'dashed' 'dotted'
     \begin{subfigure}[b]{0.35\textwidth}
         \centering
            \includegraphics[width=\textwidth]{../figures/d-cliques-mnist-1000-nodes-comparison-linear}
             \caption{\label{fig:d-cliques-mnist-1000-nodes-comparison-linear} MNIST with Linear Model: Linear Inter-clique Topologies.}
     \end{subfigure}
     \hfill
     % To regenerate the figure, from directory results/cifar10
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fractal-cliques/all/2021-03-14-17:42:46-CET ../scaling/1000/cifar10/clique-ring/all/2021-03-14-09:55:24-CET  --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fractal)' 'd-cliques (ring)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy-linear.png --font-size 13 --linestyles 'solid' 'dashed' 'dotted'
     \begin{subfigure}[b]{0.35\textwidth}
         \centering
         \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy-linear}
\caption{\label{fig:d-cliques-cifar10-1000-vs-1-node-test-accuracy-linear}  CIFAR10 with LeNet Model: Linear Inter-clique Topologies.}
     \end{subfigure}
    
     
 % To regenerate the figure, from directory results/mnist
 % python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py 1-node-iid/all/2021-03-10-09:20:03-CET ../scaling/1000/mnist/fully-connected-cliques/all/2021-03-14-17:56:26-CET ../scaling/1000/mnist/smallworld-logn-cliques/all/2021-03-23-21:45:39-CET --add-min-max --yaxis test-accuracy --legend 'lower right' --ymin 84 --ymax 92.5 --labels '1 node IID'  'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)'  --save-figure ../../figures/d-cliques-mnist-1000-nodes-comparison-super-linear.png --font-size 13 --linestyles 'solid' 'dashed' 'dotted'
     \begin{subfigure}[b]{0.35\textwidth}
         \centering
            \includegraphics[width=\textwidth]{../figures/d-cliques-mnist-1000-nodes-comparison-super-linear}
             \caption{\label{fig:d-cliques-mnist-1000-nodes-comparison-super-linear} MNIST with Linear Model: Superlinear Inter-clique Topologies.}
     \end{subfigure}
     \hfill
     % To regenerate the figure, from directory results/cifar10
% python ../../../../Software/non-iid-topology-simulator/tools/plot_convergence.py 1-node-iid/all/2021-03-10-13:52:58-CET ../scaling/1000/cifar10/fully-connected-cliques/all/2021-03-14-17:41:20-CET ../scaling/1000/cifar10/smallworld-logn-cliques/all/2021-03-23-22:13:57-CET  --add-min-max --yaxis test-accuracy --labels '1-node IID' 'd-cliques (fully-connected cliques)' 'd-cliques (smallworld)' --legend 'lower right' --save-figure ../../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy-super-linear.png --font-size 13 --linestyles 'solid' 'dashed' 'dotted'
     \begin{subfigure}[b]{0.35\textwidth}
         \centering
         \includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-1000-vs-1-node-test-accuracy-super-linear}
\caption{\label{fig:d-cliques-cifar10-1000-vs-1-node-test-accuracy-super-linear}  CIFAR10 with LeNet Model: Superlinear Inter-clique Topologies.}
     \end{subfigure}
     
\caption{\label{fig:d-cliques-cifar10-convolutional} D-Cliques Convergence Speed with 1000 nodes, non-IID, Constant Updates per Epoch, with Different Inter-Clique Topologies.}
\end{figure*}

\subsection{Cost of Constructing Cliques}
\label{section:cost-cliques}

\dots \todo{EL: Add plots showing convergence speed in terms of skew vs iteration number, as well as absolute computation time}