\caption{\label{fig:d-cliques-figure} D-Cliques with $n=100$, $M=10$ and a
fully connected inter-clique topology on a problem with 1 class/node.}
\end{figure}
\todo{AB: if time, could add fig of another inter-clique topology (ring,
fractal or small-world)}
Second, to ensure a global consensus and convergence,
\textit{inter-clique connections}
are introduced by connecting a small number of node pairs that are
part of different cliques. In the following, we introduce up to one inter-clique
connection per node such that each clique has exactly one
edge with all other cliques, see Figure~\ref{fig:d-cliques-figure} for the
corresponding D-Cliques network in the case of $n=100$ nodes and $L=10$
classes. We will explore sparser inter-clique topologies in
Section~\ref{section:interclique-topologies}.
So far, we have used a fully-connected inter-clique topology for D-Cliques,
which has the advantage of bounding the
\textit{path length}\footnote{The \textit{path length} is the number of edges on the path with the shortest number of edges between two nodes.} to $3$ between any pair of nodes. This choice requires $
\frac{n}{c}(\frac{n}{c}-1)$ inter-clique edges, which scales quadratically
in the number of nodes $n$ for a given clique size $c$\footnote{We consider \textit{directed} edges in the analysis: the number of undirected edges is half and does not affect asymptotic behavior.}. This can become significant at larger scales when $n$ is
large compared to $c$.
We first measure the convergence speed of inter-cliques topologies whose number of edges scales linearly with the number of nodes. Among those, the \textit{ring} has the (almost) fewest possible number of edges: it
uses $\frac{2n}{c}$ inter-clique edges but its average path length between nodes
also scales linearly.
We also consider another topology, which we call \textit{fractal}, that provides a
logarithmic
bound on the average path length. In this hierarchical scheme,
cliques are assembled in larger groups of $c$ cliques that are connected internally with one edge per
To ensure a global consensus and convergence, we introduce
\textit{inter-clique connections} between a small number of node pairs that
are part of different cliques, thereby implementing the \texttt{inter}
procedure called at the end of Algorithm~\ref{Algorithm:greedy-swap}.
We aim to ensure that the degree of each node remains low and balanced so as
to make the network topology well-suited to decentralized federated learning.
We consider several choices of inter-clique topology, which offer
different scalings for the number of required edges and the average distance
between nodes in the resulting graph.
The \textit{ring} has (almost) the fewest possible number of edges for the
graph to be connected: in this case, each clique is connected to connected one
other clique by a single edge. This topology requires only $O(\frac{n}{M})$
inter-clique edges but suffers an $O(n)$ average distance between nodes.
The
\textit{fractal} topology
provides a logarithmic bound on the average distance. In this
hierarchical scheme, cliques are arranged in larger groups of $M$ cliques that
are connected
internally with one edge per
pair of cliques, but with only one edge between pairs of larger groups. The
topology is built recursively such that $c$ groups will themselves form a
larger group at the next level up. This results in at most $c$ edges per node
topology is built recursively such that $M$ groups will themselves form a
larger group at the next level up. This results in at most $M$ edges per node
if edges are evenly distributed: i.e., each group within the same level adds
at most $c-1$ edges to other groups, leaving one node per group with $c-1$
at most $M-1$ edges to other groups, leaving one node per group with $M-1$
edges that can receive an additional edge to connect with other groups at the next level.
Since nodes have at most $c$ edges, $n$ nodes have at most $nc$ edges, therefore
the number of edges in this fractal scheme indeed scales linearly in the number of nodes.
Since nodes have at most $M$ edges, the total number of inter-clique edges
is at most $nM$ edges.
Second, we look at another scheme
in which the number of edges scales in a near, but not quite, linear fashion.
We propose to connect cliques according to a
We can also design an inter-clique topology in which the number of edges
scales in a log-linear fashion by following a
small-world-like topology~\cite{watts2000small} applied on top of a
ring~\cite{stoica2003chord}. In this scheme, cliques are first arranged in a
ring. Then each clique adds symmetric edges, both clockwise and
...
...
@@ -252,13 +251,60 @@ cliques that are exponentially bigger the further they are on the ring (see
Algorithm~\ref{Algorithm:Smallworld} in the appendix for
details on the construction). This ensures a good connectivity with other
cliques that are close on the ring, while still keeping the average
path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n\log(n))$ with the number of nodes.
Overall, D-Cliques ensures that the degree
of each node in the network remains low and balanced, making the topology
well-suited to
decentralized federated learning.
distance small. This scheme uses $O(m\frac{n}{M}\log(\frac{n}{M})$, i.e.
log-linear in $n$.
Finally, we can consider a \emph{fully connected} inter-clique topology,
with at most one inter-clique edge per node such that each clique has exactly
one edge with all other cliques, see Figure~\ref{fig:d-cliques-figure}. This has the advantage of
bounding the distance between any pair of nodes by $3$ but requires
$O(\frac{n^2}{M^2})$ inter-clique edges, i.e. quadratic in $n$.
% In the following, we introduce
% up to one
% inter-clique
% connection per node such that each clique has exactly one
% edge with all other cliques, see Figure~\ref{fig:d-cliques-figure} for the
% corresponding D-Cliques network in the case of $n=100$ nodes and $L=10$
% classes. We will explore sparser inter-clique topologies in
% Section~\ref{section:interclique-topologies}.
% So far, we have used a fully-connected inter-clique topology for D-Cliques,
% which has the advantage of bounding the
% \textit{path length}\footnote{The \textit{path length} is the number of edges on the path with the shortest number of edges between two nodes.} to $3$ between any pair of nodes. This choice requires $
% \frac{n}{c}(\frac{n}{c} - 1)$ inter-clique edges, which scales quadratically
% in the number of nodes $n$ for a given clique size $c$\footnote{We consider \textit{directed} edges in the analysis: the number of undirected edges is half and does not affect asymptotic behavior.}. This can become significant at larger scales when $n$ is
% large compared to $c$.
% We first measure the convergence speed of inter-cliques topologies whose number of edges scales linearly with the number of nodes. Among those, the \textit{ring} has the (almost) fewest possible number of edges: it
% uses $\frac{2n}{c}$ inter-clique edges but its average path length between nodes
% also scales linearly.
% We also consider another topology, which we call \textit{fractal}, that provides a
% logarithmic
% bound on the average path length. In this hierarchical scheme,
% cliques are assembled in larger groups of $c$ cliques that are connected internally with one edge per
% pair of cliques, but with only one edge between pairs of larger groups. The
% topology is built recursively such that $c$ groups will themselves form a
% larger group at the next level up. This results in at most $c$ edges per node
% if edges are evenly distributed: i.e., each group within the same level adds
% at most $c-1$ edges to other groups, leaving one node per group with $c-1$
% edges that can receive an additional edge to connect with other groups at the next level.
% Since nodes have at most $c$ edges, $n$ nodes have at most $nc$ edges, therefore
% the number of edges in this fractal scheme indeed scales linearly in the number of nodes.
% Second, we look at another scheme
% in which the number of edges scales in a near, but not quite, linear fashion.
% We propose to connect cliques according to a
% small-world-like topology~\cite{watts2000small} applied on top of a
% ring~\cite{stoica2003chord}. In this scheme, cliques are first arranged in a
% ring. Then each clique adds symmetric edges, both clockwise and
% counter-clockwise on the ring, with the $m$ closest cliques in sets of
% cliques that are exponentially bigger the further they are on the ring (see
% Algorithm~\ref{Algorithm:Smallworld} in the appendix for
% details on the construction). This ensures a good connectivity with other
% cliques that are close on the ring, while still keeping the average
% path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
% therefore grows in the order of $O(n\log(n))$ with the number of nodes.
\subsection{Optimizing with Clique Averaging and Momentum}