Skip to content
Snippets Groups Projects
Commit c03ff0ec authored by aurelien.bellet's avatar aurelien.bellet
Browse files

done with inter-clique part

parent 31901755
No related branches found
No related tags found
No related merge requests found
......@@ -102,8 +102,9 @@ of the absolute differences of $p_C(y)$ and $p(y)$:
To efficiently construct a set of cliques with small skew, we propose
Greedy-Swap (Algorithm~\ref{Algorithm:D-Clique-Construction}). The parameter
$M$ gives the maximum size of cliques and allows to control the intra-clique
communication costs. We start by initializing cliques at random. Then, for
$M$ is the maximum size of cliques and allows to control the
number of intra-clique edges. We start by initializing cliques at
random. Then, for
a certain number of steps $K$, we randomly pick two cliques and swap two of
their nodes so as to decrease the sum of skews of the two cliques. The swap is
chosen randomly among the ones which decrease the skew, hence
......@@ -114,6 +115,7 @@ sake of
simplicity, we assume that D-Cliques are constructed from the global
knowledge of these distributions, which can easily be obtained by
decentralized averaging in a pre-processing step.
\todo{AB: argue that we can somehow do this in decentralized way?}
\begin{algorithm}[t]
\caption{D-Cliques Construction via Greedy Swap}
......@@ -122,10 +124,9 @@ decentralized averaging in a pre-processing step.
\STATE \textbf{Require:} maximum clique size $M$, max steps $K$, set
of all nodes $N = \{ 1, 2, \dots, n \}$,
% \STATE $\textit{skew}(S)$: skew of subset $S \subseteq N$ compared to the global distribution (Eq.~\ref{eq:skew}),
% \STATE $\textit{intra}(DC)$: edges within cliques $C \in DC$,
% \STATE $\textit{inter}(DC)$: edges between $C_1,C_2 \in DC$ (Sec.~\ref{section:interclique-topologies}),
% \STATE $\textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}).
% \STATE ~~
procedure $\texttt{inter}(\cdot)$ to create intra-clique connections
(see Sec.~\ref{section:interclique-topologies}) % \STATE $
% \textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}).
\STATE $DC \leftarrow []$ %\COMMENT{Empty list}
\WHILE {$N \neq \emptyset$}
\STATE $C \leftarrow$ sample $M$ nodes from $N$ at random
......@@ -148,8 +149,10 @@ decentralized averaging in a pre-processing step.
\STATE $C_1 \leftarrow C_1 \setminus\{j\}\cup\{i\}; C_2 \leftarrow C_2 \setminus\{j\}\cup\{i\}$
\ENDIF
\ENDFOR
\STATE $G \leftarrow$ graph composed of the cliques in $DC$
\RETURN $G$
\STATE $E\leftarrow \{(i,j) : C\in DC, i,j\in C, i\neq j\}$
% \STATE $G \leftarrow$ graph composed of the cliques in $DC$
\RETURN topology $G=(\{1,\dots,n\},E \cup
\texttt{inter}(DC))$
\end{algorithmic}
\end{algorithm}
......@@ -202,48 +205,44 @@ choices for this inter-clique topology in the next section.
\begin{figure}[t]
\centering
\includegraphics[width=0.20\textwidth]{../figures/fully-connected-cliques}
\caption{\label{fig:d-cliques-figure} D-Cliques (fully-connected
cliques) example with 1 class/node.}
\caption{\label{fig:d-cliques-figure} D-Cliques with $n=100$, $M=10$ and a
fully connected inter-clique topology on a problem with 1 class/node.}
\end{figure}
\todo{AB: if time, could add fig of another inter-clique topology (ring,
fractal or small-world)}
Second, to ensure a global consensus and convergence,
\textit{inter-clique connections}
are introduced by connecting a small number of node pairs that are
part of different cliques. In the following, we introduce up to one inter-clique
connection per node such that each clique has exactly one
edge with all other cliques, see Figure~\ref{fig:d-cliques-figure} for the
corresponding D-Cliques network in the case of $n=100$ nodes and $L=10$
classes. We will explore sparser inter-clique topologies in
Section~\ref{section:interclique-topologies}.
So far, we have used a fully-connected inter-clique topology for D-Cliques,
which has the advantage of bounding the
\textit{path length}\footnote{The \textit{path length} is the number of edges on the path with the shortest number of edges between two nodes.} to $3$ between any pair of nodes. This choice requires $
\frac{n}{c}(\frac{n}{c} - 1)$ inter-clique edges, which scales quadratically
in the number of nodes $n$ for a given clique size $c$\footnote{We consider \textit{directed} edges in the analysis: the number of undirected edges is half and does not affect asymptotic behavior.}. This can become significant at larger scales when $n$ is
large compared to $c$.
We first measure the convergence speed of inter-cliques topologies whose number of edges scales linearly with the number of nodes. Among those, the \textit{ring} has the (almost) fewest possible number of edges: it
uses $\frac{2n}{c}$ inter-clique edges but its average path length between nodes
also scales linearly.
We also consider another topology, which we call \textit{fractal}, that provides a
logarithmic
bound on the average path length. In this hierarchical scheme,
cliques are assembled in larger groups of $c$ cliques that are connected internally with one edge per
To ensure a global consensus and convergence, we introduce
\textit{inter-clique connections} between a small number of node pairs that
are part of different cliques, thereby implementing the \texttt{inter}
procedure called at the end of Algorithm~\ref{Algorithm:greedy-swap}.
We aim to ensure that the degree of each node remains low and balanced so as
to make the network topology well-suited to decentralized federated learning.
We consider several choices of inter-clique topology, which offer
different scalings for the number of required edges and the average distance
between nodes in the resulting graph.
The \textit{ring} has (almost) the fewest possible number of edges for the
graph to be connected: in this case, each clique is connected to connected one
other clique by a single edge. This topology requires only $O(\frac{n}{M})$
inter-clique edges but suffers an $O(n)$ average distance between nodes.
The
\textit{fractal} topology
provides a logarithmic bound on the average distance. In this
hierarchical scheme, cliques are arranged in larger groups of $M$ cliques that
are connected
internally with one edge per
pair of cliques, but with only one edge between pairs of larger groups. The
topology is built recursively such that $c$ groups will themselves form a
larger group at the next level up. This results in at most $c$ edges per node
topology is built recursively such that $M$ groups will themselves form a
larger group at the next level up. This results in at most $M$ edges per node
if edges are evenly distributed: i.e., each group within the same level adds
at most $c-1$ edges to other groups, leaving one node per group with $c-1$
at most $M-1$ edges to other groups, leaving one node per group with $M-1$
edges that can receive an additional edge to connect with other groups at the next level.
Since nodes have at most $c$ edges, $n$ nodes have at most $nc$ edges, therefore
the number of edges in this fractal scheme indeed scales linearly in the number of nodes.
Since nodes have at most $M$ edges, the total number of inter-clique edges
is at most $nM$ edges.
Second, we look at another scheme
in which the number of edges scales in a near, but not quite, linear fashion.
We propose to connect cliques according to a
We can also design an inter-clique topology in which the number of edges
scales in a log-linear fashion by following a
small-world-like topology~\cite{watts2000small} applied on top of a
ring~\cite{stoica2003chord}. In this scheme, cliques are first arranged in a
ring. Then each clique adds symmetric edges, both clockwise and
......@@ -252,13 +251,60 @@ cliques that are exponentially bigger the further they are on the ring (see
Algorithm~\ref{Algorithm:Smallworld} in the appendix for
details on the construction). This ensures a good connectivity with other
cliques that are close on the ring, while still keeping the average
path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n\log(n))$ with the number of nodes.
Overall, D-Cliques ensures that the degree
of each node in the network remains low and balanced, making the topology
well-suited to
decentralized federated learning.
distance small. This scheme uses $O(m\frac{n}{M}\log(\frac{n}{M})$, i.e.
log-linear in $n$.
Finally, we can consider a \emph{fully connected} inter-clique topology,
with at most one inter-clique edge per node such that each clique has exactly
one edge with all other cliques, see Figure~\ref{fig:d-cliques-figure}. This has the advantage of
bounding the distance between any pair of nodes by $3$ but requires
$O(\frac{n^2}{M^2})$ inter-clique edges, i.e. quadratic in $n$.
% In the following, we introduce
% up to one
% inter-clique
% connection per node such that each clique has exactly one
% edge with all other cliques, see Figure~\ref{fig:d-cliques-figure} for the
% corresponding D-Cliques network in the case of $n=100$ nodes and $L=10$
% classes. We will explore sparser inter-clique topologies in
% Section~\ref{section:interclique-topologies}.
% So far, we have used a fully-connected inter-clique topology for D-Cliques,
% which has the advantage of bounding the
% \textit{path length}\footnote{The \textit{path length} is the number of edges on the path with the shortest number of edges between two nodes.} to $3$ between any pair of nodes. This choice requires $
% \frac{n}{c}(\frac{n}{c} - 1)$ inter-clique edges, which scales quadratically
% in the number of nodes $n$ for a given clique size $c$\footnote{We consider \textit{directed} edges in the analysis: the number of undirected edges is half and does not affect asymptotic behavior.}. This can become significant at larger scales when $n$ is
% large compared to $c$.
% We first measure the convergence speed of inter-cliques topologies whose number of edges scales linearly with the number of nodes. Among those, the \textit{ring} has the (almost) fewest possible number of edges: it
% uses $\frac{2n}{c}$ inter-clique edges but its average path length between nodes
% also scales linearly.
% We also consider another topology, which we call \textit{fractal}, that provides a
% logarithmic
% bound on the average path length. In this hierarchical scheme,
% cliques are assembled in larger groups of $c$ cliques that are connected internally with one edge per
% pair of cliques, but with only one edge between pairs of larger groups. The
% topology is built recursively such that $c$ groups will themselves form a
% larger group at the next level up. This results in at most $c$ edges per node
% if edges are evenly distributed: i.e., each group within the same level adds
% at most $c-1$ edges to other groups, leaving one node per group with $c-1$
% edges that can receive an additional edge to connect with other groups at the next level.
% Since nodes have at most $c$ edges, $n$ nodes have at most $nc$ edges, therefore
% the number of edges in this fractal scheme indeed scales linearly in the number of nodes.
% Second, we look at another scheme
% in which the number of edges scales in a near, but not quite, linear fashion.
% We propose to connect cliques according to a
% small-world-like topology~\cite{watts2000small} applied on top of a
% ring~\cite{stoica2003chord}. In this scheme, cliques are first arranged in a
% ring. Then each clique adds symmetric edges, both clockwise and
% counter-clockwise on the ring, with the $m$ closest cliques in sets of
% cliques that are exponentially bigger the further they are on the ring (see
% Algorithm~\ref{Algorithm:Smallworld} in the appendix for
% details on the construction). This ensures a good connectivity with other
% cliques that are close on the ring, while still keeping the average
% path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
% therefore grows in the order of $O(n\log(n))$ with the number of nodes.
\subsection{Optimizing with Clique Averaging and Momentum}
\label{section:clique-averaging-momentum}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment