@@ -196,7 +196,7 @@ requires 98\% less edges ($18.9$ vs $999$ edges per participant on average),
thereby yielding a 96\% reduction in the total number of required messages
(37.8 messages per round per node on average instead of 999), to obtain a similar convergence speed as a fully-connected topology. Furthermore an additional 22\% improvement
% (14.5 edges per node on average instead of 18.9)
is possible when using a small-world inter-clique topology, with further potential gains at larger scales because of its linear-logarithmic scaling.
is possible when using a small-world inter-clique topology, with further potential gains at larger scales because of its quasilinear scaling ($O(n \log(n))$) in $n$, the number of nodes.
The rest of this paper is organized as follows. We first present the problem
statement and our methodology (Section~\ref{section:problem}). The D-Cliques
...
...
@@ -615,7 +615,7 @@ nodes.
\subsection{Comparing D-Cliques to Other Sparse Topologies}
We demonstrate the advantages of D-cliques over alternative sparse topologies
We demonstrate the advantages of D-Cliques over alternative sparse topologies
that have a similar number of edges. First, we consider topologies in which
the neighbors of each node are selected at random (hence without any clique
structure).
...
...
@@ -719,7 +719,7 @@ So far, we have used a fully-connected inter-clique topology for D-Cliques,
which has the advantage of bounding the
average shortest path to $2$ between any pair of nodes. This choice requires $
\frac{n}{c}(\frac{n}{c}-1)$ inter-clique edges, which scales quadratically
in the number of nodes. This can become significant at larger scales when $n$ is
in the number of nodes$n$ for a given clique size $c$. This can become significant at larger scales when $n$ is
large compared to $c$.
In this last series of experiments, we evaluate the effect of choosing sparser
...
...
@@ -752,8 +752,8 @@ cliques that are exponentially bigger the further they are on the ring (see
Algorithm~\ref{Algorithm:Smallworld} in the appendix for
details on the construction). This ensures a good connectivity with other
cliques that are close on the ring, while still keeping the average shortest
path small. This scheme uses $2(ns)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n+\log(n))$ with the number of nodes.
path small. This scheme uses $\frac{n}{c}*2(ns)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n\log(n))$ with the number of nodes.
Figure~\ref{fig:d-cliques-cifar10-convolutional} shows the convergence
speed of all the above schemes on MNIST and CIFAR10, compared to the ideal
...
...
@@ -889,8 +889,8 @@ averaging gradients only with other nodes within the clique. Clique Averaging
can in turn be used to implement unbiased momentum to recover the convergence
speed usually only possible with IID mini-batches. Through our experiments, we
showed that the clique structure of D-Cliques is critical in obtaining these
results and that a small-world inter-clique topology with only $O(n
+ log (n))$edges achieves the best compromise between
results and that a small-world inter-clique topology with only $O(n\log(n))$
edges achieves the best compromise between
convergence speed and scalability with the number of nodes.
D-Cliques thus appears to be very promising to reduce bandwidth
...
...
@@ -984,7 +984,7 @@ small-world inter-clique topology as described in Section~\ref{section:intercliq
linear number of inter-clique edges by first arranging cliques on a ring. It then adds a logarithmic number of ``finger'' edges to other cliques on the ring chosen such that there is a constant number of edges added per set, on sets that are exponentially bigger the further away on the ring. ``Finger'' edges are added symmetrically on both sides of the ring to the cliques in each set that are closest to a given set.
\begin{algorithm}[h]
\caption{$\textit{smallworld}(DC)$: adds $O(\# N +\log(\# N))$ edges}
\caption{$\textit{smallworld}(DC)$: adds $O(\# N \log(\# N))$ edges}
\label{Algorithm:Smallworld}
\begin{algorithmic}[1]
\State\textbf{Require:} set of cliques $DC$ (set of set of nodes)