simplify algo, now starting to inter-cliquesection

31901755 · aurelien.bellet · a7468dee · 31901755
Commit 31901755 authored 3 years ago by aurelien.bellet
--- a/mlsys2022style/d-cliques.tex
+++ b/mlsys2022style/d-cliques.tex
@@ -100,59 +100,56 @@ of the absolute differences of $p_C(y)$ and $p(y)$:
 % \end{split}
 % \end{equation}
-\begin{figure}[t]
-    \centering
-    \includegraphics[width=0.20\textwidth]{../figures/fully-connected-cliques}
-    \caption{\label{fig:d-cliques-figure} D-Cliques (fully-connected
-    cliques) example with 1 class/node.}
-\end{figure}
 To efficiently construct a set of cliques with small skew, we propose
-Greedy-Swap (Algorithm~\ref{Algorithm:D-Clique-Construction}).
+Greedy-Swap (Algorithm~\ref{Algorithm:D-Clique-Construction}). The parameter
-We start by initializing cliques at random, using at most $M$
+$M$ gives the maximum size of cliques and allows to control the intra-clique
-nodes to limit the intra-clique communication costs, then we 
+communication costs. We start by initializing cliques at random. Then, for
-swap nodes between pairs of cliques chosen at random such that the swap
+a certain number of steps $K$, we randomly pick two cliques and swap two of
-decreases the skew of that pair but keeps
+their nodes so as to decrease the sum of skews of the two cliques. The swap is
-the size of the cliques constant (see Algorithm~\ref{Algorithm:D-Clique-Construction}).  
+chosen randomly among the ones which decrease the skew, hence
-Only swaps that decrease the skew are performed, hence this algorithm can be
+this algorithm can be seen as a form of randomized greedy algorithm.
-seen as a form of randomized greedy algorithm. We note that this algorithm only requires
+We note that this algorithm only requires
-the knowledge of the label distribution at each node. For the sake of
+the knowledge of the label distribution $p_i(y)$ at each node $i$. For the
+sake of
 simplicity, we assume that D-Cliques are constructed from the global
 knowledge of these distributions, which can easily be obtained by
 decentralized averaging in a pre-processing step. 
-\begin{algorithm}[h]
+\begin{algorithm}[t]
   \caption{D-Cliques Construction via Greedy Swap}
   \label{Algorithm:greedy-swap}
   \begin{algorithmic}[1]
-        \STATE \textbf{Require:} Clique size $M$, Max steps $K$,
+        \STATE \textbf{Require:} maximum clique size $M$, max steps $K$, set
-        \STATE Set of all nodes $N = \{ 1, 2, \dots, n \}$,
+        of all nodes $N = \{ 1, 2, \dots, n \}$,
-        \STATE $\textit{skew}(S)$: skew of subset $S \subseteq N$ compared to the global distribution (Eq.~\ref{eq:skew}), 
+        % \STATE $\textit{skew}(S)$: skew of subset $S \subseteq N$ compared to the global distribution (Eq.~\ref{eq:skew}), 
-        \STATE $\textit{intra}(DC)$: edges within cliques $C \in DC$,
+        % \STATE $\textit{intra}(DC)$: edges within cliques $C \in DC$,
-        \STATE $\textit{inter}(DC)$: edges between $C_1,C_2 \in DC$ (Sec.~\ref{section:interclique-topologies}),
+        % \STATE $\textit{inter}(DC)$: edges between $C_1,C_2 \in DC$ (Sec.~\ref{section:interclique-topologies}),
-         \STATE $\textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}).
+         % \STATE $\textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}).
-         \STATE ~~
+         % \STATE ~~
-         \STATE $DC \leftarrow []$ \COMMENT{Empty list}
+         \STATE $DC \leftarrow []$ %\COMMENT{Empty list}
         \WHILE {$N \neq \emptyset$}
         \STATE $C \leftarrow$ sample $M$ nodes from $N$ at random
-         \STATE $N \leftarrow N \setminus C$; $DC.append(C)$
+         \STATE $N \leftarrow N \setminus C$; $DC.\text{append}(C)$
         \ENDWHILE
         \FOR{$k \in \{1, \dots, K\}$}
-        \STATE $C_1,C_2 \leftarrow$ sample 2 from $DC$ at random
+        \STATE $C_1,C_2 \leftarrow$ random sample of 2 elements from $DC$
+          \STATE $s \leftarrow \textit{skew}(C_1) + skew(C_2)$
        \STATE $\textit{swaps} \leftarrow []$
-        \FOR{$n_1 \in C_1, n_2 \in C_2$}
+        \FOR{$i \in C_1, j \in C_2$}
-          \STATE $s \leftarrow skew(C_1) + skew(C_2)$
+          \STATE $s' \leftarrow \textit{skew}(C_1\setminus\{i\}\cup\{j\})
-          \STATE $s' \leftarrow \textit{skew}(C_1-n_1+n_2) + \textit{skew}(C_2 -n_2+n_1)$
+          + \textit{skew}(C_2 \setminus\{i\}\cup\{j\})$\hspace*{-.05cm}
          \IF {$s' < s$}
            \STATE \textit{swaps}.append($(n_1, n_2)$)
          \ENDIF
        \ENDFOR
-        \IF {\#\textit{swaps} $> 0$}
+        \IF {len(\textit{swaps}) $> 0$}
-          \STATE $(n_1,n_2) \leftarrow$ sample 1 from $\textit{swaps}$ at random
+          \STATE $(n_1,n_2) \leftarrow$ random element from $
-          \STATE $C_1 \leftarrow C_1 - n_1 + n_2; C_2 \leftarrow C_2 - n_2 + n1$
+          \textit{swaps}$ 
+          \STATE $C_1 \leftarrow C_1 \setminus\{j\}\cup\{i\}; C_2 \leftarrow C_2 \setminus\{j\}\cup\{i\}$
        \ENDIF
         \ENDFOR
-        \RETURN $(weights(\textit{intra}(DC) \cup \textit{inter}(DC)), DC)$
+         \STATE $G \leftarrow$ graph composed of the cliques in $DC$
+        \RETURN $G$
   \end{algorithmic}
 \end{algorithm}
@@ -191,17 +188,26 @@ decentralized averaging in a pre-processing step.
 %    \end{algorithmic}
 % \end{algorithm}
-The key idea of D-Cliques is that because the clique-level distribution $D_C$
+The key idea of D-Cliques is that because the clique-level label distribution
- is representative of the global distribution $D$,
+$p_C(y)$
+ is representative of the global distribution $p(y)$,
 the local models of nodes across cliques remain rather close. Therefore, a
 sparse inter-clique topology can be used, significantly reducing the total
-number of edges without slowing down the convergence. Furthermore, the degree
+number of edges without slowing down the convergence. We discuss
-of each node in the network remains low and even, making the D-Cliques
+choices for this inter-clique topology in the next section.
-topology very well-suited to decentralized federated learning. 
 \subsection{Adding Sparse Inter-Clique Connections}
 \label{section:interclique-topologies}
+\begin{figure}[t]
+    \centering
+    \includegraphics[width=0.20\textwidth]{../figures/fully-connected-cliques}
+    \caption{\label{fig:d-cliques-figure} D-Cliques (fully-connected
+    cliques) example with 1 class/node.}
+\end{figure}
+\todo{AB: if time, could add fig of another inter-clique topology (ring,
+fractal or small-world)}
 Second, to ensure a global consensus and convergence, 
 \textit{inter-clique connections}
 are introduced by connecting a small number of node pairs that are
@@ -249,6 +255,11 @@ cliques that are close on the ring, while still keeping the average
 path length small. This scheme uses $\frac{n}{c}*2(m)\log(\frac{n}{c})$ inter-clique edges and
 therefore grows in the order of $O(n\log(n))$ with the number of nodes.
+Overall, D-Cliques ensures that the degree
+of each node in the network remains low and balanced, making the topology
+well-suited to
+decentralized federated learning. 
 \subsection{Optimizing with Clique Averaging and Momentum}
 \label{section:clique-averaging-momentum}