Skip to content
Snippets Groups Projects
Commit 31901755 authored by aurelien.bellet's avatar aurelien.bellet
Browse files

simplify algo, now starting to inter-cliquesection

parent a7468dee
No related branches found
No related tags found
No related merge requests found
...@@ -100,59 +100,56 @@ of the absolute differences of $p_C(y)$ and $p(y)$: ...@@ -100,59 +100,56 @@ of the absolute differences of $p_C(y)$ and $p(y)$:
% \end{split} % \end{split}
% \end{equation} % \end{equation}
\begin{figure}[t]
\centering
\includegraphics[width=0.20\textwidth]{../figures/fully-connected-cliques}
\caption{\label{fig:d-cliques-figure} D-Cliques (fully-connected
cliques) example with 1 class/node.}
\end{figure}
To efficiently construct a set of cliques with small skew, we propose To efficiently construct a set of cliques with small skew, we propose
Greedy-Swap (Algorithm~\ref{Algorithm:D-Clique-Construction}). Greedy-Swap (Algorithm~\ref{Algorithm:D-Clique-Construction}). The parameter
We start by initializing cliques at random, using at most $M$ $M$ gives the maximum size of cliques and allows to control the intra-clique
nodes to limit the intra-clique communication costs, then we communication costs. We start by initializing cliques at random. Then, for
swap nodes between pairs of cliques chosen at random such that the swap a certain number of steps $K$, we randomly pick two cliques and swap two of
decreases the skew of that pair but keeps their nodes so as to decrease the sum of skews of the two cliques. The swap is
the size of the cliques constant (see Algorithm~\ref{Algorithm:D-Clique-Construction}). chosen randomly among the ones which decrease the skew, hence
Only swaps that decrease the skew are performed, hence this algorithm can be this algorithm can be seen as a form of randomized greedy algorithm.
seen as a form of randomized greedy algorithm. We note that this algorithm only requires We note that this algorithm only requires
the knowledge of the label distribution at each node. For the sake of the knowledge of the label distribution $p_i(y)$ at each node $i$. For the
sake of
simplicity, we assume that D-Cliques are constructed from the global simplicity, we assume that D-Cliques are constructed from the global
knowledge of these distributions, which can easily be obtained by knowledge of these distributions, which can easily be obtained by
decentralized averaging in a pre-processing step. decentralized averaging in a pre-processing step.
\begin{algorithm}[h] \begin{algorithm}[t]
\caption{D-Cliques Construction via Greedy Swap} \caption{D-Cliques Construction via Greedy Swap}
\label{Algorithm:greedy-swap} \label{Algorithm:greedy-swap}
\begin{algorithmic}[1] \begin{algorithmic}[1]
\STATE \textbf{Require:} Clique size $M$, Max steps $K$, \STATE \textbf{Require:} maximum clique size $M$, max steps $K$, set
\STATE Set of all nodes $N = \{ 1, 2, \dots, n \}$, of all nodes $N = \{ 1, 2, \dots, n \}$,
\STATE $\textit{skew}(S)$: skew of subset $S \subseteq N$ compared to the global distribution (Eq.~\ref{eq:skew}), % \STATE $\textit{skew}(S)$: skew of subset $S \subseteq N$ compared to the global distribution (Eq.~\ref{eq:skew}),
\STATE $\textit{intra}(DC)$: edges within cliques $C \in DC$, % \STATE $\textit{intra}(DC)$: edges within cliques $C \in DC$,
\STATE $\textit{inter}(DC)$: edges between $C_1,C_2 \in DC$ (Sec.~\ref{section:interclique-topologies}), % \STATE $\textit{inter}(DC)$: edges between $C_1,C_2 \in DC$ (Sec.~\ref{section:interclique-topologies}),
\STATE $\textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}). % \STATE $\textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}).
\STATE ~~ % \STATE ~~
\STATE $DC \leftarrow []$ \COMMENT{Empty list} \STATE $DC \leftarrow []$ %\COMMENT{Empty list}
\WHILE {$N \neq \emptyset$} \WHILE {$N \neq \emptyset$}
\STATE $C \leftarrow$ sample $M$ nodes from $N$ at random \STATE $C \leftarrow$ sample $M$ nodes from $N$ at random
\STATE $N \leftarrow N \setminus C$; $DC.append(C)$ \STATE $N \leftarrow N \setminus C$; $DC.\text{append}(C)$
\ENDWHILE \ENDWHILE
\FOR{$k \in \{1, \dots, K\}$} \FOR{$k \in \{1, \dots, K\}$}
\STATE $C_1,C_2 \leftarrow$ sample 2 from $DC$ at random \STATE $C_1,C_2 \leftarrow$ random sample of 2 elements from $DC$
\STATE $s \leftarrow \textit{skew}(C_1) + skew(C_2)$
\STATE $\textit{swaps} \leftarrow []$ \STATE $\textit{swaps} \leftarrow []$
\FOR{$n_1 \in C_1, n_2 \in C_2$} \FOR{$i \in C_1, j \in C_2$}
\STATE $s \leftarrow skew(C_1) + skew(C_2)$ \STATE $s' \leftarrow \textit{skew}(C_1\setminus\{i\}\cup\{j\})
\STATE $s' \leftarrow \textit{skew}(C_1-n_1+n_2) + \textit{skew}(C_2 -n_2+n_1)$ + \textit{skew}(C_2 \setminus\{i\}\cup\{j\})$\hspace*{-.05cm}
\IF {$s' < s$} \IF {$s' < s$}
\STATE \textit{swaps}.append($(n_1, n_2)$) \STATE \textit{swaps}.append($(n_1, n_2)$)
\ENDIF \ENDIF
\ENDFOR \ENDFOR
\IF {\#\textit{swaps} $> 0$} \IF {len(\textit{swaps}) $> 0$}
\STATE $(n_1,n_2) \leftarrow$ sample 1 from $\textit{swaps}$ at random \STATE $(n_1,n_2) \leftarrow$ random element from $
\STATE $C_1 \leftarrow C_1 - n_1 + n_2; C_2 \leftarrow C_2 - n_2 + n1$ \textit{swaps}$
\STATE $C_1 \leftarrow C_1 \setminus\{j\}\cup\{i\}; C_2 \leftarrow C_2 \setminus\{j\}\cup\{i\}$
\ENDIF \ENDIF
\ENDFOR \ENDFOR
\RETURN $(weights(\textit{intra}(DC) \cup \textit{inter}(DC)), DC)$ \STATE $G \leftarrow$ graph composed of the cliques in $DC$
\RETURN $G$
\end{algorithmic} \end{algorithmic}
\end{algorithm} \end{algorithm}
...@@ -191,17 +188,26 @@ decentralized averaging in a pre-processing step. ...@@ -191,17 +188,26 @@ decentralized averaging in a pre-processing step.
% \end{algorithmic} % \end{algorithmic}
% \end{algorithm} % \end{algorithm}
The key idea of D-Cliques is that because the clique-level distribution $D_C$ The key idea of D-Cliques is that because the clique-level label distribution
is representative of the global distribution $D$, $p_C(y)$
is representative of the global distribution $p(y)$,
the local models of nodes across cliques remain rather close. Therefore, a the local models of nodes across cliques remain rather close. Therefore, a
sparse inter-clique topology can be used, significantly reducing the total sparse inter-clique topology can be used, significantly reducing the total
number of edges without slowing down the convergence. Furthermore, the degree number of edges without slowing down the convergence. We discuss
of each node in the network remains low and even, making the D-Cliques choices for this inter-clique topology in the next section.
topology very well-suited to decentralized federated learning.
\subsection{Adding Sparse Inter-Clique Connections} \subsection{Adding Sparse Inter-Clique Connections}
\label{section:interclique-topologies} \label{section:interclique-topologies}
\begin{figure}[t]
\centering
\includegraphics[width=0.20\textwidth]{../figures/fully-connected-cliques}
\caption{\label{fig:d-cliques-figure} D-Cliques (fully-connected
cliques) example with 1 class/node.}
\end{figure}
\todo{AB: if time, could add fig of another inter-clique topology (ring,
fractal or small-world)}
Second, to ensure a global consensus and convergence, Second, to ensure a global consensus and convergence,
\textit{inter-clique connections} \textit{inter-clique connections}
are introduced by connecting a small number of node pairs that are are introduced by connecting a small number of node pairs that are
...@@ -249,6 +255,11 @@ cliques that are close on the ring, while still keeping the average ...@@ -249,6 +255,11 @@ cliques that are close on the ring, while still keeping the average
path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n\log(n))$ with the number of nodes. therefore grows in the order of $O(n\log(n))$ with the number of nodes.
Overall, D-Cliques ensures that the degree
of each node in the network remains low and balanced, making the topology
well-suited to
decentralized federated learning.
\subsection{Optimizing with Clique Averaging and Momentum} \subsection{Optimizing with Clique Averaging and Momentum}
\label{section:clique-averaging-momentum} \label{section:clique-averaging-momentum}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment