@@ -832,8 +832,8 @@ cliques that are exponentially bigger the further they are on the ring (see
Algorithm~\ref{Algorithm:Smallworld} in the appendix for
details on the construction). This ensures a good connectivity with other
cliques that are close on the ring, while still keeping the average shortest
path small. This scheme uses $2(ns)log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n + log(n))$ with the number of nodes.
path small. This scheme uses $2(ns)\log(\frac{n}{c})$ inter-clique edges and
therefore grows in the order of $O(n +\log(n))$ with the number of nodes.
Figure~\ref{fig:d-cliques-cifar10-convolutional} shows the convergence
speed of all the above schemes on MNIST and CIFAR10, compared to the ideal
...
...
@@ -1023,19 +1023,26 @@ approximately recover the global distribution.
\newpage
\appendix
\section{Algorithms}
\section{Detailed Algorithms}
We present a more detailed and precise explanation the two main algorithms of the paper, for D-Clique Construction (Algorithm~\ref{Algorithm:D-Clique-Construction}) and to establish a Smallworld interconnection between cliques (Algorithm~\ref{Algorithm:Smallworld}).
We present a more detailed and precise explanation of the two main algorithms
of the paper, for D-Clique Construction (Algorithm~
\ref{Algorithm:D-Clique-Construction}) and to establish a small-world
Algorithm~\ref{Algorithm:D-Clique-Construction} shows the overall approach for constructing a D-Cliques topology in the non-IID case.\footnote{An IID version of D-Cliques, in which each node has an equal number of examples of all classes, can be implemented by picking $\#L$ nodes per clique at random.} It expects the following inputs: $L$, the set of all classes present in the global distribution $D =\bigcup_{i \in N} D_i$; $N$, the set of all nodes; a function $classes(S)$, which given a subset $S$ of nodes in $N$ returns the set of classes in their joint local distributions ($D_S =\bigcup_{i \in S} D_i$); a function $intraconnect(DC)$, which given $DC$, a set of cliques (set of set of nodes), creates a set of edges ($\{(i,j), \dots\}$) connecting all nodes within each clique to one another; a function $interconnect(DC)$, which given a set of cliques, creates a set of edges ($\{(i,j), \dots\}$) connecting nodes belonging to different cliques; and a function $weigths(E)$, which given a set of edges, returns the weighted matrix $W_{ij}$. Algorithm~\ref{Algorithm:D-Clique-Construction} returns both $W_{ij}$, for use in D-SGD (Algorithm~\ref{Algorithm:D-PSGD} and~\ref{Algorithm:Clique-Unbiased-D-PSGD}), and $DC$, for use with Clique Averaging (Algorithm~\ref{Algorithm:Clique-Unbiased-D-PSGD}).
Algorithm~\ref{Algorithm:D-Clique-Construction} shows the overall approach
for constructing a D-Cliques topology in the non-IID case.\footnote{An IID
version of D-Cliques, in which each node has an equal number of examples of
all classes, can be implemented by picking $\#L$ nodes per clique at random.}
It expects the following inputs: $L$, the set of all classes present in the global distribution $D =\bigcup_{i \in N} D_i$; $N$, the set of all nodes; a function $classes(S)$, which given a subset $S$ of nodes in $N$ returns the set of classes in their joint local distributions ($D_S =\bigcup_{i \in S} D_i$); a function $intraconnect(DC)$, which given $DC$, a set of cliques (set of set of nodes), creates a set of edges ($\{\{i,j\}, \dots\}$) connecting all nodes within each clique to one another; a function $interconnect(DC)$, which given a set of cliques, creates a set of edges ($\{\{i,j\}, \dots\}$) connecting nodes belonging to different cliques; and a function $weigths(E)$, which given a set of edges, returns the weighted matrix $W_{ij}$. Algorithm~\ref{Algorithm:D-Clique-Construction} returns both $W_{ij}$, for use in D-SGD (Algorithm~\ref{Algorithm:D-PSGD} and~\ref{Algorithm:Clique-Unbiased-D-PSGD}), and $DC$, for use with Clique Averaging (Algorithm~\ref{Algorithm:Clique-Unbiased-D-PSGD}).
\begin{algorithm}[h]
\caption{D-Clique Construction}
\label{Algorithm:D-Clique-Construction}
\begin{algorithmic}[1]
\State\textbf{Require} set of classes globally present $L$,
\State\textbf{Require:} set of classes globally present $L$,
\State~~ set of all nodes $N =\{1, 2, \dots, n \}$,
\State~~ fn $\textit{classes}(S)$ that returns the classes present in a subset of nodes $S$,
\State~~ fn $\textit{intraconnect}(DC)$ that returns edges intraconnecting cliques of $DC$,
...
...
@@ -1047,10 +1054,10 @@ approximately recover the global distribution.
\State$n \leftarrow\text{pick}~1~\text{from}~\{ m \in R | \textit{classes}(\{m\})\subsetneq\textit{classes}(\textit{C})\}$
\State$R \leftarrow R \setminus\{ n \}$;
\State$R \leftarrow R \setminus\{ n \}$
\State$C \leftarrow C \cup\{ n \}$
\If{$\textit{classes}(C)= L$}
\State$DC \leftarrow DC \cup\{ C \}$;
\State$DC \leftarrow DC \cup\{ C \}$
\State$C \leftarrow\emptyset$
\EndIf
\EndWhile
...
...
@@ -1058,33 +1065,42 @@ approximately recover the global distribution.
\end{algorithmic}
\end{algorithm}
The implementation builds a single clique by adding nodes with different classes until all classes of the global distribution are represented. All cliques are built one at a time until all nodes are parts of cliques. Because all classes are represented on an equal number of nodes, all cliques will have nodes of all classes. And because nodes have examples of a single class, we are guaranteed a valid assignment is possible in a greedy manner. After cliques are created, edges are added and weights are assigned to edges, using the corresponding input functions.
The implementation builds a single clique by adding nodes with different
classes until all classes of the global distribution are represented. Each
clique is built sequentially until all nodes are parts of cliques.
Because all classes are represented on an equal number of nodes, all cliques
will have nodes of all classes. Furthermore, since nodes have examples
of a single class, we are guaranteed a valid assignment is possible in a greedy manner. After cliques are created, edges are added and weights are assigned to edges, using the corresponding input functions.
\subsection{Smallworld Interclique Topology}
\subsection{Small-world Inter-clique Topology}
Algorithm~\ref{Algorithm:Smallworld} shows the construction of the smallworld interclique topology. It adds a linear number of interclique edges by first arranging cliques on a ring. It then adds a logarithmic number of "finger" edges to other cliques on the ring chosen such that there is a constant number of edges added per set, on sets that are exponentially bigger the further away on the ring. "Finger" edges are added symmetrically on both sides of the ring to the cliques in each set that are closest to a given set.
Algorithm~\ref{Algorithm:Smallworld} instantiates the function
\textit{interconnect} with a
small-world inter-clique topology as described in Section~\ref{section:interclique-topologies}. It adds a
linear number of inter-clique edges by first arranging cliques on a ring. It then adds a logarithmic number of ``finger'' edges to other cliques on the ring chosen such that there is a constant number of edges added per set, on sets that are exponentially bigger the further away on the ring. ``Finger'' edges are added symmetrically on both sides of the ring to the cliques in each set that are closest to a given set.
\begin{algorithm}[h]
\caption{$\textit{smallworld}(DC)$: adds $O(\# N + log(\# N))$ edges}
\caption{$\textit{smallworld}(DC)$: adds $O(\# N +\log(\# N))$ edges}
\label{Algorithm:Smallworld}
\begin{algorithmic}[1]
\State\textbf{Require} set of cliques $DC$ (set of set of nodes)
\State\textbf{Require:} set of cliques $DC$ (set of set of nodes)
\State ~~size of neighborhood $ns$ (default 2)
\State ~~function $\textit{least\_edges}(S, E)$ that returns one of the nodes in $S$ with the least number of edges in $E$
\State$E \leftarrow\emptyset$\Comment{Set of Edges}
\State$L \leftarrow[ C~\text{for}~C \in DC ]$\Comment{Arrange cliques in a list}
\For{$i \in\{1,\dots,\#DC\}$}\Comment{For every clique}
\State\Comment{For sets of cliques exponentially further away from $i$}
\State\Comment{Add interclique connections in both directions}
\State\Comment{Add inter-clique connections in both directions}
\State$n \leftarrow\textit{least\_edges}(L_i, E)$
\State$m \leftarrow\textit{least\_edges}(L_{(i+\textit{offset}+k)\%\#DC}, E)$\Comment{clockwise in ring}
\State$E \leftarrow E \cup\{(n,m), (m,n)\}$
\State$E \leftarrow E \cup\{\{n,m\}\}$
\State$n \leftarrow\textit{least\_edges}(L_i, E)$
\State$m \leftarrow\textit{least\_edges}(L_{(i-\textit{offset}-k)\%\#DC} , E)$\Comment{counter-clockwise in ring}
\State$E \leftarrow E \cup\{(n,m), (m,n)\}$
\State$E \leftarrow E \cup\{\{n,m\}\}$
\EndFor
\EndFor
\EndFor
...
...
@@ -1092,9 +1108,15 @@ Algorithm~\ref{Algorithm:Smallworld} shows the construction of the smallworld in
\end{algorithmic}
\end{algorithm}
The algorithm expects a set of cliques $DC$, previously computed by Algorithm~\ref{Algorithm:D-Clique-Construction}; a size of neighbourhood $ns$, which is the number of finger edges to add per set of cliques, and a function \textit{least\_edges}, which given a set of nodes $S$ and an existing set of edges $E =($\{(i,j), \dots\}$)$, returns one of the nodes in $E$ with the least number of edges. It returns a set of edges $($\{(i,j), \dots\}$)$ with all edges added by the smallworld topology.
Algorithm~\ref{Algorithm:Smallworld} expects a set of cliques $DC$, previously computed by
Algorithm~\ref{Algorithm:D-Clique-Construction}; a size of neighborhood $ns$,
which is the number of finger edges to add per set of cliques, and a function
\textit{least\_edges}, which given a set of nodes $S$ and an existing set of
edges $E =\{\{i,j\}, \dots\}$, returns one of the nodes in $E$ with the least number of edges. It returns a new set of edges $\{\{i,j\}, \dots\}$ with all edges added by the small-world topology.
The implementation first arranges the cliques of $DC$ on a list, which represents the ring. Traversing the list with increasing indexes is equivalent to traversing the ring in the clockwise direction, and inversely. Then, for every clique $i$ on the ring from which we are computing the distance to others, a number of edges are added. All other cliques are implicitly arranged in mutually exclusive sets, with size and at offset exponentially bigger (doubling at every step). Then for every of these sets, $ns$ edges are added, both in the clockwise and counter-clockwise directions, always on the nodes with the least number of edges in each clique. The ring edges are implicitly added to the cliques at offset $1$ in both directions.
The implementation first arranges the cliques of $DC$ in a list, which
represents the ring. Traversing the list with increasing indices is equivalent
to traversing the ring in the clockwise direction, and inversely. Then, for every clique $i$ on the ring from which we are computing the distance to others, a number of edges are added. All other cliques are implicitly arranged in mutually exclusive sets, with size and at offset exponentially bigger (doubling at every step). Then for every of these sets, $ns$ edges are added, both in the clockwise and counter-clockwise directions, always on the nodes with the least number of edges in each clique. The ring edges are implicitly added to the cliques at offset $1$ in both directions.