Skip to content
Snippets Groups Projects
Commit ff482913 authored by Erick Lavoie's avatar Erick Lavoie
Browse files

Completed first draft of evaluation section

parent 437d8e52
No related branches found
No related tags found
No related merge requests found
Showing
with 221 additions and 146 deletions
figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png

88 KiB | W: | H:

figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png

90.1 KiB | W: | H:

figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png
figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png
figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png
figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -345,7 +345,8 @@ of the local models across nodes.
inter-clique connections (see main text for details).}
\end{figure}
\paragraph{Clique Averaging.} We address this problem by adding \emph{Clique
\paragraph{\label{section:clique-averaging} Clique Averaging.}
We address this problem by adding \emph{Clique
Averaging} to D-SGD
(Algorithm~\ref{Algorithm:Clique-Unbiased-D-PSGD}), which essentially
decouples gradient averaging from model averaging. The idea is to use only the
......
......@@ -3,13 +3,13 @@
\section{Evaluation}
\label{section:evaluation}
%In this section, we first compare D-Cliques to alternative topologies to
%confirm the relevance of our main design choices. Then,
%we evaluate some extensions of D-Cliques to further reduce the number of
%inter-clique connections so as to gracefully scale with the number of
%nodes.
\todo{EL: Revise intro to section}
In this section, we first compare D-Cliques to alternative topologies to
show their benefits and the relevance of our main design choices. Then,
we evaluate different inter-clique topologies to further to further reduce the number of
inter-clique connections so as to gracefully scale with the number of
nodes. We finally show D-Cliques to be resilient to some intra-clique
connectivity failures and that Greedy Swap (Alg.~\ref{Algorithm:greedy-swap})
constructs cliques with low skew efficiently.
\subsection{Experimental setup.}
\label{section:experimental-settings}
......@@ -21,15 +21,16 @@ can remove much of the effect of label distribution skew.
We experiment with two datasets: MNIST~\cite{mnistWebsite} and
CIFAR10~\cite{krizhevsky2009learning}, which both have $L=10$ classes.
For MNIST, we use 45k and 10k examples from the original 60k
training set for training and validation respectively. The remaining 5k
training examples were randomly removed to ensure all 10 classes are balanced
while ensuring that the dataset is evenly divisible across 100 and 1000 nodes.
We use all 10k examples of
the test set to measure prediction accuracy. For CIFAR10, classes are evenly
balanced: we use 45k/50k images of the original training set for training,
5k/50k for validation, and all 10k examples of the test set for measuring
prediction accuracy.
For MNIST, we use 50k and 10k examples from the original 60k training
set for training and validation respectively. We use all 10k examples of
the test set to measure prediction accuracy. The validation set preserves the
original unbalanced ratio of the classes in the test set.
For CIFAR10, classes are evenly balanced: we initially used 45k/50k images
of the original training set for training, 5k/50k for validation, and all 10k examples
of the test set for measuring prediction accuracy. After tuning hyper-parameters
on initial experiments, we then used all 50k images of the original training set
for training for all experiments, as the 45k did not split evenly with 1000 nodes
in the partitioning scheme explained in the next paragraph.
We use the non-IID partitioning scheme proposed by ~\cite{mcmahan2016communication}
in their seminal Federated Learning paper for MNIST on both MNIST and CIFAR10:
......@@ -45,7 +46,7 @@ We
use a logistic regression classifier for MNIST, which
provides up to 92.5\% accuracy in the centralized setting.
For CIFAR10, we use a Group-Normalized variant of LeNet~\cite{quagmire}, a
deep convolutional network which achieves an accuracy of $72.3\%$ in the
deep convolutional network which achieves an accuracy of $74.15\%$ in the
centralized setting.
These models are thus reasonably accurate (which is sufficient to
study the effect of the topology) while being sufficiently fast to train in a
......@@ -56,11 +57,13 @@ validation set for 100 nodes, obtaining respectively $0.1$ and $128$ for
MNIST and $0.002$ and $20$ for CIFAR10.
For CIFAR10, we additionally use a momentum of $0.9$.
We evaluate 100- and 1000-node networks by creating multiple models in memory and simulating the exchange of messages between nodes.
We evaluate 100- and 1000-node networks by creating multiple models
in memory and simulating the exchange of messages between nodes.
To ignore the impact of distributed execution strategies and system
optimization techniques, we report the test accuracy of all nodes (min, max,
average) as a function of the number of times each example of the dataset has
been sampled by a node, i.e. an \textit{epoch}. This is equivalent to the classic case of a single node sampling the full distribution.
been sampled by a node, i.e. an \textit{epoch}. This is equivalent to the classic
case of a single node sampling the full distribution.
To further make results comparable across different number of nodes, we lower
the batch size proportionally to the number of nodes added, and inversely,
e.g. on MNIST, 128 with 100 nodes vs. 13 with 1000 nodes. This
......@@ -72,18 +75,36 @@ resulting communication overhead is impractical.}
Finally, we compare our results against an ideal baseline: either a
fully-connected network topology with the same number of nodes or a single IID
node. In both cases, the topology has no effect on
the optimization. For a certain choice of number of nodes and
mini-batch size, both approaches are equivalent.
the optimization. Both approaches effectively optimize a single model and sample
uniformly from the global distribution, yet we have observed a fully-connected network
to convergence slightly faster and obtain slightly
better final accuracy than a single node sampling from the global distribution in the presence
of data heterogeneity\footnote{We
conjecture that an heterogeneous data partition in a fully-connected network may force
more balanced representation of all classes in the union of all mini-batches, leading to better convergence.}.
We therefore compare against a fully-connected network unless the simulation time
prevented us to obtain results in time for submission, in which case we use a single node IID.
\subsection{D-Cliques match the Convergence Speed of Fully-Connected with a Fraction of the Edges}
\label{section:d-cliques-vs-fully-connected}
In this first experiment, we show that D-Cliques with Clique Averaging (and Momentum mentioned) converges
almost as fast as a fully-connected network on both MNIST and CIFAR10. Figure~\ref{fig:convergence-speed-dc-vs-fc-2-shards-per-node}
illustrates the convergence speed of D-Cliques with $n=100$ nodes on MNIST (with Clique Averaging)
and CIFAR10 (with Clique Averaging and Momentum). Observe that the convergence speed is
very close to that of a fully-connected topology, and significantly better than with
a ring or a grid (see Figure~\ref{fig:iid-vs-non-iid-problem}).
It also has less variance than both the ring and grid. With
100 nodes, it offers a reduction of $\approx90\%$ in the number of edges
compared to a fully-connected topology.
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name fully-connected d-cliques/greedy-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-28-23:16:47-CEST-labostrex117 all/2021-09-28-23:18:49-CEST-labostrex119 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 80 --ymax 92.5 --yaxis test-accuracy --labels 'fully-connected' 'd-cliques (fc) w/ cliq-avg' --save-figure ../mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-28-23:16:47-CEST-labostrex117 all/2021-09-28-23:18:49-CEST-labostrex119 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 80 --ymax 92.5 --yaxis test-accuracy --labels 'fully-connected' 'd-cliques (fc) w/ cliq-avg' --save-figure ../mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name fully-connected d-cliques/greedy-swap --nodes:name 2-shards-eq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-02-18:58:22-CEST-labostrex114 all/2021-10-03-19:53:21-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'fully-connected w/ mom.' 'd-cliques (fc) w/ c-avg, mom.' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png --linestyles 'solid' 'dashed' --legend 'lower right' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-02-18:58:22-CEST-labostrex114 all/2021-10-03-19:53:21-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'fully-connected w/ mom.' 'd-cliques (fc) w/ c-avg, mom.' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png --linestyles 'solid' 'dashed' --legend 'lower right' --font-size 18 --linewidth 3
\begin{figure}[htbp]
\centering
......@@ -98,45 +119,42 @@ mini-batch size, both approaches are equivalent.
\includegraphics[width=\textwidth]{figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node}
\caption{\label{fig:convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node} CIFAR10}
\end{subfigure}
\caption{\label{fig:convergence-speed-dc-vs-fc-2-shards-per-node} Convergence Speed of D-Cliques constructed with Greedy Swap Compared to Fully-Connected on 100 Nodes (2 shards/node).}
\caption{\label{fig:convergence-speed-dc-vs-fc-2-shards-per-node} Convergence Speed of D-Cliques constructed with Greedy Swap Compared to Fully-Connected on 100 Nodes (2 shards/node). Bold line is the average accuracy over all nodes. Thinner upper and lower lines and maximum and minimum accuracy over all nodes.}
\end{figure}
\subsection{Clique Averaging and Momentum are Beneficial and Sometimes Necessary}
Figure~\ref{fig:convergence-speed-dc-vs-fc-2-shards-per-node} illustrates the
convergence speed of D-Cliques with $n=100$ nodes on MNIST (with Clique Averaging) and CIFAR10 (with Clique Averaging and Momentum). Observe that the
convergence speed is
very close
to that of a fully-connected topology, and significantly better than with
a ring or a grid (see Figure~\ref{fig:iid-vs-non-iid-problem}). With
100 nodes, it offers a reduction of $\approx90\%$ in the number of edges
compared to a fully-connected topology.
\subsection{Clique Averaging and Momentum are Necessary}
Figure~\ref{fig:d-clique-mnist-clique-avg} shows that Clique Averaging (Alg.~\autoref{Algorithm:Clique-Unbiased-D-PSGD})
reduces the variance of models across nodes and accelerates
convergence on MNIST. Note that Clique Averaging induces a small
additional cost, as gradients
and models need to be sent in two separate rounds of messages.
Nonetheless, compared to fully connecting all nodes, the total number
of messages is reduced by $\approx 80\%$.
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name d-cliques/greedy-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-03:53:42-CEST-labostrex119 all/2021-09-28-23:18:49-CEST-labostrex119 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 89 --ymax 92.5 --yaxis test-accuracy --labels 'd-cliques w/o c-avg.' 'd-cliques w/ c-avg.' --save-figure ../mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-03:53:42-CEST-labostrex119 all/2021-09-28-23:18:49-CEST-labostrex119 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 89 --ymax 92.5 --yaxis test-accuracy --labels 'd-cliques (fc) w/o c-avg.' 'd-cliques (fc) w/ c-avg.' --save-figure ../mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 2.5
\begin{figure}[htbp]
\centering
\includegraphics[width=0.35\textwidth]{figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node}
\includegraphics[width=0.23\textwidth]{figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node}
\caption{\label{fig:d-clique-mnist-clique-avg} Effect of Clique Averaging on MNIST. Y axis starts at 89.}
\end{figure}
As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, Clique Averaging
significantly reduces the variance of models across nodes and accelerates
convergence to reach the same level as the one obtained with a
fully-connected topology. Note that Clique Averaging induces a small
additional cost, as gradients
and models need to be sent in two separate rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$.
Figure~\ref{fig:cifar10-c-avg-momentum} shows the interaction between
Clique Averaging and Momentum on CIFAR10. Without Clique Averaging,
the use of momentum is actually detrimental. With Clique Averaging, the
situation reverses and momentum is again beneficial. The combination
of both has the fastest convergence speed and the lowest variance among all
four possibilities.
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name d-cliques/greedy-swap --nodes:name 2-shards-eq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% w/o Clique Averaging
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-23:37:42-CEST-labostrex117 all/2021-10-04-03:13:46-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques w/o momentum' 'd-cliques w/ momentum' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-23:37:42-CEST-labostrex117 all/2021-10-04-03:13:46-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques (fc) w/o momentum' 'd-cliques (fc) w/ momentum' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3
% w/ Clique Averaging
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-16:10:34-CEST-labostrex117 all/2021-10-03-19:53:21-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques w/o momentum' 'd-cliques w/ momentum' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-16:10:34-CEST-labostrex117 all/2021-10-03-19:53:21-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques (fc) w/o momentum' 'd-cliques (fc) w/ momentum' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3 --legend 'lower right'
\begin{figure}[htbp]
\centering
......@@ -154,23 +172,20 @@ and models need to be sent in two separate rounds of messages. Nonetheless, comp
\caption{\label{fig:cifar10-c-avg-momentum} Effect of Clique Averaging and Momentum on CIFAR10 with LeNet.}
\end{figure}
As shown in
Figure~\ref{fig:cifar10-c-avg-momentum},
the use of Clique Averaging restores the benefits of momentum and closes the gap
with the centralized setting.
\subsection{D-Cliques Converge Faster than Random Graphs}
\autoref{fig:d-cliques-comparison-to-non-clustered-topologies} shows that D-Cliques, even
without Clique Averaging or Momentum converge faster and with lower variance than a
random graph with a similar number of edges (10) per node, and therefore that a careful
design of the topology is indeed necessary.
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name random-graph d-cliques/greedy-swap greedy-neighbourhood-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-03:53:42-CEST-labostrex119 all/2021-09-29-22:17:08-CEST-labostrex118 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 80 --ymax 92.5 --yaxis test-accuracy --labels 'd-cliques (fc) w/o cliq-avg' 'random 10' --save-figure ../mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png --linestyles 'solid' 'dashed' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-03:53:42-CEST-labostrex119 all/2021-09-29-22:17:08-CEST-labostrex118 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 80 --ymax 92.5 --yaxis test-accuracy --labels 'd-cliques (fc) w/o cliq-avg' 'random 10' --save-figure ../mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name d-cliques/greedy-swap random-graph --nodes:nb-nodes 100 --algorithm:learning-momentum 0.9 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-23:37:42-CEST-labostrex117 all/2021-10-05-18:38:30-CEST-labostrex115 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques (fc) w/o cliq-avg w/o mom.' 'random 10 w/o mom.' --save-figure ../mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png --linestyles 'solid' 'dashed' --font-size 18
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-23:37:42-CEST-labostrex117 all/2021-10-05-18:38:30-CEST-labostrex115 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques (fc) w/o cliq-avg w/o mom.' 'random 10 w/o mom.' --save-figure ../mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3
\begin{figure}[htbp]
\centering
......@@ -188,23 +203,47 @@ with the centralized setting.
\caption{\label{fig:convergence-random-vs-d-cliques-2-shards} Comparison to Random Graph with 10 edges per node \textit{without} Clique Averaging or Momentum (see main text for justification).}
\end{figure}
In comparison to a random graph however, D-Cliques provide additional benefits: they ensure
a diverse representation of all classes in the immediate neighbourhood of all nodes; they enable
Clique Averaging to debias gradients; and they provide a high-level of clustering, i.e. neighbors
of a node are neighbors themselves, which tends to lower variance.
In order, to distinguish the effect of the first two from the last, we compare D-Cliques to other variations
of random graphs: (1) with the additional constraint that all classes should be represented in the immediate neighborhood of all nodes
(i.e. 'all classes repr.'), and (2) in combination with unbiased gradients computed using
the average of the gradients of all neighbors for all nodes. Satisfying the first constraint while obtaining
the same skew as D-Cliques built with Greedy Swap was challenging with the current partitioning scheme, so
we performed the experiments in a more heterogeneous setting in which each node has examples of only 1 class.
In this setting, it is easy to construct cliques and a random graph such that the neighborhood of each node in both
cases has a skew of 0.
\begin{figure}[htbp]
\centering
\begin{subfigure}[b]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{../figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies}
% From directory results/mnist:
% python ../../../../Software/non-iid-topology-simulator/tools/v1/plot_convergence.py fully-connected-cliques/all/2021-03-10-10:19:44-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-12-11:12:49-CET random-10/all/2021-07-23-11:59:56-CEST random-10-diverse/all/2021-03-17-20:28:35-CET --labels 'd-clique (fcc)' 'd-clique (fcc) no clique avg.' '10 random edges' '10 random edges (all classes represented)' --add-min-max --legend 'lower right' --ymin 80 --ymax 92.5 --yaxis test-accuracy --save-figure ../../mlsys2022style/figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png --font-size 13 --linestyles 'solid' 'dashed' 'dotted' 'dashdot' --linewidth 3
\includegraphics[width=\textwidth]{figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies}
\caption{MNIST}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{../figures/d-cliques-cifar10-linear-comparison-to-non-clustered-topologies}
% To regenerate the figure, from directory results/cifar10
% python ../../../../Software/non-iid-topology-simulator/tools/v1/plot_convergence.py no-init/fully-connected-cliques/all/2021-03-13-18:32:55-CET no-init-no-clique-avg/fully-connected-cliques/all/2021-03-13-18:34:35-CET random-10/all/2021-07-23-14:33:48-CEST random-10-diverse/all/2021-03-17-20:30:41-CET random-10-diverse-unbiased-gradient/all/2021-03-17-20:31:14-CET --labels 'd-clique (fcc) clique avg.' 'd-clique (fcc) no clique avg.' '10 random edges' '10 random edges (all classes repr.)' '10 random (all classes repr.) with unbiased grad.' --add-min-max --legend 'upper left' --yaxis test-accuracy --save-figure ../../mlsys2022style/figures/d-cliques-cifar10-linear-comparison-to-non-clustered-topologies.png --ymax 119 --font-size 13 --linestyles 'solid' 'dashed' 'dotted' 'dashdot' 'solid' --markers '' '' '' '' 'o' --linewidth 3
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-linear-comparison-to-non-clustered-topologies}
\caption{CIFAR10}
\end{subfigure}
\caption{\label{fig:d-cliques-comparison-to-non-clustered-topologies} Comparison to Random Graph with 10 edges per node, \textit{with} Clique Averaging (and Momentum) as well as analogous Neighbourhood Averaging for Random Graph \textit{in a more stringent partitioning of 1 class/node}\textsuperscript{*}.}
\footnotesize\textsuperscript{*}\textit{These results were obtained with a previous version of the simulator but should be consistent with the latest. They will be updated for the final version.}
\footnotesize\textsuperscript{*}\textit{These results were obtained with a previous version of the simulator but should be consistent with the latest. They will be updated for the final version of the paper.}
\end{figure}
\autoref{fig:d-cliques-comparison-to-non-clustered-topologies} shows the results for MNIST and CIFAR10. In the case of MNIST,
D-Cliques converge faster than all other options. In the case of CIFAR10, the clustering appears to be critical
for good convergence speed: even a random graph with diverse neighborhoods and unbiased gradients
converges significantly slower.
%We demonstrate the advantages of D-Cliques over alternative sparse topologies
%that have a similar number of edges. First, we consider topologies in which
%the neighbors of each node are selected at random (hence without any clique
......@@ -275,92 +314,15 @@ with the centralized setting.
%data with sparse topologies requires a very careful design, as we have
%proposed with D-Cliques.
\subsection{Cliques built with Greedy Swap Converge Faster than Random Cliques}
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name d-cliques/random-cliques d-cliques/greedy-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-22:12:59-CEST-labostrex114 all/2021-09-28-23:18:49-CEST-labostrex119 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 80 --ymax 92.5 --yaxis test-accuracy --labels 'd-cliques random' 'd-cliques greedy-swap' --save-figure ../mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name d-cliques/random-cliques d-cliques/greedy-swap --nodes:name 2-shards-eq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-04-21:18:33-CEST-labostrex117 all/2021-10-03-19:53:21-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques random' 'd-cliques greedy-swap' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18
\begin{figure}[htbp]
\centering
\begin{subfigure}[b]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node}
\caption{\label{fig:convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node} MNIST}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node}
\caption{\label{fig:convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node} CIFAR10}
\end{subfigure}
\caption{\label{fig:convergence-speed-dc-random-vs-dc-gs-2-shards-per-node} Convergence Speed of D-Cliques constructed Randomly vs Greedy Swap on 100 Nodes (2 shards/node).}
\end{figure}
\subsection{D-Cliques May Tolerate Some Intra-Connectivity Failures}
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name d-cliques/greedy-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% w/o Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-03:53:42-CEST-labostrex119 all/2021-10-01-21:44:14-CEST-labostrex113 all/2021-10-02-06:53:40-CEST-labostrex113 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 89 --ymax 92.5 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18
% w/ Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-28-23:18:49-CEST-labostrex119 all/2021-10-01-17:08:42-CEST-labostrex113 all/2021-10-02-02:17:43-CEST-labostrex113 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 89 --ymax 92.5 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18
\begin{figure}[htbp]
\centering
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-mnist-wo-clique-avg-impact-of-edge-removal} Without Clique Averaging }
\end{subfigure}
\hfill
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-mnist-w-clique-avg-impact-of-edge-removal} With Clique Averaging}
\end{subfigure}
\caption{\label{fig:d-cliques-mnist-intra-connectivity} MNIST: Impact of Intra-clique Connectivity Failures. Y axis starts at 89.}
\end{figure}
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name d-cliques/greedy-swap --nodes:name 2-shards-eq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% w/o Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-04-03:13:46-CEST-labostrex117 all/2021-10-06-17:58:49-CEST-labostrex112 all/2021-10-06-17:45:22-CEST-labostrex115 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 80 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18
% w/ Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-19:53:21-CEST-labostrex117 all/2021-10-06-12:46:49-CEST-labostrex112 all/2021-10-06-12:49:51-CEST-labostrex115 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 80 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18
\begin{figure}[htbp]
\centering
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal} Without Clique Averaging }
\end{subfigure}
\hfill
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-cifar10-w-clique-avg-impact-of-edge-removal} With Clique Averaging}
\end{subfigure}
\caption{\label{fig:d-cliques-cifar10-intra-connectivity} CIFAR: Impact of Intra-clique Connectivity Failures (with Momentum).}
\end{figure}
\subsection{D-Cliques Scale with Sparser Inter-Clique Topologies}
In this last series of experiments, we evaluate the effect of choosing sparser
inter-clique topologies on the convergence speed for a larger network of 1000
nodes. We compare the scalability and convergence speed of the several
D-Cliques variants introduced in Section~\ref{section:interclique-topologies}.
Figure~\ref{fig:d-cliques-cifar10-convolutional} shows the convergence
speed of all sparse inter-clique topologies on MNIST and CIFAR10, compared to the ideal
baseline
of a
\autoref{fig:d-cliques-scaling-mnist-1000} and \autoref{fig:d-cliques-scaling-cifar10-1000}
show the convergence
speed of all sparse inter-clique topologies of Section~\ref{section:interclique-topologies}
respectively on MNIST and CIFAR10 on a larger network of 1000 nodes, compared to the ideal
baseline of a
single IID node performing the same number of updates per epoch (representing
the fastest convergence speed achievable if topology had no impact). Among the linear schemes, the ring
topology converges but is much slower than our fractal scheme. Among the super-linear schemes, the small-world
......@@ -373,10 +335,9 @@ fully-connected topology still offers
significant benefits with 1000 nodes, as it represents a 98\% reduction in the
number of edges compared to fully connecting individual nodes (18.9 edges on
average instead of 999) and a 96\% reduction in the number of messages (37.8
messages per round per node on average instead of 999). We refer to
Appendix~\ref{app:scaling} for additional results comparing the convergence
speed across different number of nodes. Overall, these results
show that D-Cliques can nicely scale with the number of nodes.
messages per round per node on average instead of 999).
%We refer to Appendix~\ref{app:scaling} for additional results comparing the convergence speed across different number of nodes.
Overall, these results show that D-Cliques can nicely scale with the number of nodes.
% From directory 'results-v2':
% MNIST
......@@ -445,9 +406,83 @@ show that D-Cliques can nicely scale with the number of nodes.
%\caption{\label{fig:d-cliques-cifar10-convolutional} D-Cliques Convergence Speed with 1000 nodes, non-IID, Constant Updates per Epoch, with Different Inter-Clique Topologies.}
%\end{figure}
\subsection{Cliques with Low Skew can be Constructed Efficiently with Greedy Swap}
\subsection{D-Cliques Can Tolerate Some Intra-Connectivity Failures}
We measured the impact of randomly removing 1 and 5 intra-clique edges per
clique to assess how critical full connectivity is within cliques.
\autoref{fig:d-cliques-mnist-intra-connectivity} shows that for MNIST, when not using Clique Averaging,
removing edges decreases slightly the convergence speed and increases
the variance between nodes. However, when using Clique Averaging, even removing 5 edges affects very little
the convergence speed.
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name d-cliques/greedy-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% w/o Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-03:53:42-CEST-labostrex119 all/2021-10-01-21:44:14-CEST-labostrex113 all/2021-10-02-06:53:40-CEST-labostrex113 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 89 --ymax 92.5 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18 --linewidth 3
% w/ Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-28-23:18:49-CEST-labostrex119 all/2021-10-01-17:08:42-CEST-labostrex113 all/2021-10-02-02:17:43-CEST-labostrex113 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 89 --ymax 92.5 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18 --linewidth 3
\begin{figure}[htbp]
\centering
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-mnist-wo-clique-avg-impact-of-edge-removal} Without Clique Averaging }
\end{subfigure}
\hfill
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-mnist-w-clique-avg-impact-of-edge-removal} With Clique Averaging}
\end{subfigure}
\caption{\label{fig:d-cliques-mnist-intra-connectivity} MNIST: Impact of Intra-clique Connectivity Failures. Y axis starts at 89.}
\end{figure}
\autoref{fig:d-cliques-cifar10-intra-connectivity} shows that for CIFAR10, the impact is stronger. We show the results with and without Clique Averaging
with momentum in both cases, as momentum is critical for obtaining the best convergence speed. Without Clique Averaging,
removing edges has a small effect on convergence speed and variance, but the convergence speed is too slow to be practical.
With Clique Averaging, the removing edges has initially a small effect, when only removing one edge per clique. However, It becomes
significant when removing 5 edges per clique. D-Cliques can therefore tolerate some connectivity failures between clique members
but the number of those failures depends on the dataset and model trained.
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name d-cliques/greedy-swap --nodes:name 2-shards-eq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% w/o Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-04-03:13:46-CEST-labostrex117 all/2021-10-06-17:58:49-CEST-labostrex112 all/2021-10-06-17:45:22-CEST-labostrex115 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 80 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18 --linewidth 3
% w/ Clique Gradient
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-03-19:53:21-CEST-labostrex117 all/2021-10-06-12:46:49-CEST-labostrex112 all/2021-10-06-12:49:51-CEST-labostrex115 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 80 --yaxis test-accuracy --labels 'full intra-connectivity' '-1 edge/clique' '-5 edges/clique' --save-figure ../mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png --linestyles 'solid' 'dashed' 'dotted' --font-size 18 --linewidth 3
\begin{figure}[htbp]
\centering
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal} Without Clique Averaging }
\end{subfigure}
\hfill
\begin{subfigure}[htbp]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal}
\caption{\label{fig:d-cliques-cifar10-w-clique-avg-impact-of-edge-removal} With Clique Averaging}
\end{subfigure}
\caption{\label{fig:d-cliques-cifar10-intra-connectivity} CIFAR: Impact of Intra-clique Connectivity Failures (with Momentum).}
\end{figure}
\subsection{Greedy Swap Improves Random Cliques at an Affordable Cost}
\label{section:greedy-swap-vs-random-cliques}
In the next two sub-sections, we compare cliques built with Greedy Swap (Alg.~\ref{Algorithm:greedy-swap})
to Random Cliques, on their quality (skew), the cost
of their construction, and their convergence speed.
\subsubsection{Cliques with Low Skew can be Constructed Efficiently with Greedy Swap}
\label{section:cost-cliques}
We compared the final average skew of 10 cliques created either randomly or with Greedy Swap,
over 100 experiments after 1000 steps. \autoref{fig:skew-convergence-speed-2-shards}, in the form of an histogram,
shows that Greedy Swap generate cliques of significantly lower skew, close to 0 in a majority of cases for both MNIST and CIFAR10.
% MNIST
% python $TOOLS/plot/skew/final-distribution.py --rundirs skews-mnist/* --save-figure ../mlsys2022style/figures/final-skew-distribution-mnist.png --labels 'Greedy Swap' 'Random Cliques' --linewidth 2.5 --font-size 18 --linestyles 'solid' 'dashed'
% CIFAR10
......@@ -469,9 +504,48 @@ show that D-Cliques can nicely scale with the number of nodes.
\caption{\label{fig:final-skew-distribution} Final Quality of Cliques (Skew) with a Maximum Size of 10 over 100 Experiments.}
\end{figure}
\autoref{fig:skew-convergence-speed-2-shards} shows such a low skew can be achieved
in less than 400 steps for both MNIST and CIFAR, which in practice takes less than 6 seconds in Python 3.7 on a
Macbook Pro 2020 for a network of 100 nodes and cliques of size 10. Greedy Swap
is therefore fast and efficient. Moreover, it illustrates that an unbalanced number of examples
between classes makes the construction of cliques with low skew harder and slower.
%python $TOOLS/analyze/filter.py skews --topology:name d-cliques/greedy-swap | python $TOOLS/plot/skew/convergence.py --max-steps 400 --labels 'MNIST (unbalanced classes)' 'CIFAR10 (balanced classes)' --linewidth 2.5 --save-figure ../mlsys2022style/figures/skew-convergence-speed-2-shards.png
\begin{figure}[htbp]
\centering
\includegraphics[width=0.3\textwidth]{figures/skew-convergence-speed-2-shards}
\caption{\label{fig:skew-convergence-speed-2-shards} Speed of Skew Decrease during Clique Construction. Bold line is the average over 100 experiments and 10 cliques/experiments. Thin lines are respectively the minimum and maximum over all experiments. In wall-clock time, 1000 steps take less than 6 seconds in Python 3.7 on a MacBook Pro 2020.}
\end{figure}
\ No newline at end of file
\end{figure}
\subsubsection{Cliques built with Greedy Swap Converge Faster than Random Cliques}
\autoref{fig:convergence-speed-dc-random-vs-dc-gs-2-shards-per-node} compares
the convergence speed of cliques optimized with Greedy Swap for 1000 steps with cliques built randomly
(equivalent to Greedy Swap with 0 steps). For both MNIST and CIFAR10, convergence speed
increases significantly and variance between nodes decreases dramatically. Decreasing the skew of cliques
is therefore critical to convergence speed.
% From directory 'results-v2':
% MNIST
% python $TOOLS/analyze/filter.py all --dataset:name mnist --topology:name d-cliques/random-cliques d-cliques/greedy-swap --nodes:name 2-shards-uneq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-09-29-22:12:59-CEST-labostrex114 all/2021-09-28-23:18:49-CEST-labostrex119 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 80 --ymax 92.5 --yaxis test-accuracy --labels 'd-cliques random' 'd-cliques greedy-swap' --save-figure ../mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3
% CIFAR10
% python $TOOLS/analyze/filter.py all --dataset:name cifar10 --topology:name d-cliques/random-cliques d-cliques/greedy-swap --nodes:name 2-shards-eq-classes --meta:seed 1 --nodes:nb-nodes 100 | python $TOOLS/analyze/diff.py
% python $TOOLS/analyze/diff.py --rundirs all/2021-10-04-21:18:33-CEST-labostrex117 all/2021-10-03-19:53:21-CEST-labostrex117 --pass-through | python $TOOLS/plot/convergence.py --add-min-max --ymin 0 --ymax 100 --yaxis test-accuracy --labels 'd-cliques random' 'd-cliques greedy-swap' --save-figure ../mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png --linestyles 'solid' 'dashed' --font-size 18 --linewidth 3
\begin{figure}[htbp]
\centering
\begin{subfigure}[b]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node}
\caption{\label{fig:convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node} MNIST}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.23\textwidth}
\centering
\includegraphics[width=\textwidth]{figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node}
\caption{\label{fig:convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node} CIFAR10}
\end{subfigure}
\caption{\label{fig:convergence-speed-dc-random-vs-dc-gs-2-shards-per-node} Convergence Speed of D-Cliques constructed Randomly vs Greedy Swap on 100 Nodes (2 shards/node).}
\end{figure}
mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png

50.3 KiB | W: | H:

mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png

53 KiB | W: | H:

mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png
mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png
mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png
mlsys2022style/figures/convergence-cifar10-random-vs-d-cliques-2-shards.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png

55.8 KiB | W: | H:

mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png

60.5 KiB | W: | H:

mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png
mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png
mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png
mlsys2022style/figures/convergence-mnist-random-vs-d-cliques-2-shards.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png

43.9 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png

44.4 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-dc-fc-vs-fc-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png

44.6 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png

46.3 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-dc-random-vs-dc-gs-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png

46.6 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png

51.3 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-w-c-avg-no-mom-vs-mom-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png

45.3 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png

50.5 KiB | W: | H:

mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-cifar10-wo-c-avg-no-mom-vs-mom-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png

44.6 KiB | W: | H:

mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png

45.7 KiB | W: | H:

mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-fc-vs-fc-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png

62 KiB | W: | H:

mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png

69.7 KiB | W: | H:

mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-no-c-avg-vs-c-avg-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png

56.5 KiB | W: | H:

mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png

60.4 KiB | W: | H:

mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png
mlsys2022style/figures/convergence-speed-mnist-dc-random-vs-dc-gs-2-shards-per-node.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png

60.9 KiB | W: | H:

mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png

65 KiB | W: | H:

mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-cifar10-w-clique-avg-impact-of-edge-removal.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png

52.4 KiB | W: | H:

mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png

56.4 KiB | W: | H:

mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-cifar10-wo-clique-avg-impact-of-edge-removal.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/d-cliques-mnist-linear-comparison-to-non-clustered-topologies.png

90.1 KiB

mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png

72.4 KiB | W: | H:

mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png

78.2 KiB | W: | H:

mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-mnist-w-clique-avg-impact-of-edge-removal.png
  • 2-up
  • Swipe
  • Onion skin
mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png

85 KiB | W: | H:

mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png

91.8 KiB | W: | H:

mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png
mlsys2022style/figures/d-cliques-mnist-wo-clique-avg-impact-of-edge-removal.png
  • 2-up
  • Swipe
  • Onion skin
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment