Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
D-Cliques
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
SaCS
Distributed Machine Learning
D-Cliques
Commits
31901755
Commit
31901755
authored
3 years ago
by
aurelien.bellet
Browse files
Options
Downloads
Patches
Plain Diff
simplify algo, now starting to inter-cliquesection
parent
a7468dee
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
mlsys2022style/d-cliques.tex
+50
-39
50 additions, 39 deletions
mlsys2022style/d-cliques.tex
with
50 additions
and
39 deletions
mlsys2022style/d-cliques.tex
+
50
−
39
View file @
31901755
...
@@ -100,59 +100,56 @@ of the absolute differences of $p_C(y)$ and $p(y)$:
...
@@ -100,59 +100,56 @@ of the absolute differences of $p_C(y)$ and $p(y)$:
% \end{split}
% \end{split}
% \end{equation}
% \end{equation}
\begin{figure}
[t]
\centering
\includegraphics
[width=0.20\textwidth]
{
../figures/fully-connected-cliques
}
\caption
{
\label
{
fig:d-cliques-figure
}
D-Cliques (fully-connected
cliques) example with 1 class/node.
}
\end{figure}
To efficiently construct a set of cliques with small skew, we propose
To efficiently construct a set of cliques with small skew, we propose
Greedy-Swap (Algorithm~
\ref
{
Algorithm:D-Clique-Construction
}
).
Greedy-Swap (Algorithm~
\ref
{
Algorithm:D-Clique-Construction
}
). The parameter
We start by initializing cliques at random, using at most
$
M
$
$
M
$
gives the maximum size of cliques and allows to control the intra-clique
nodes to limit the intra-clique communication costs, then we
communication costs. We start by initializing cliques at random. Then, for
swap nodes between pairs of cliques chosen at random such that the swap
a certain number of steps
$
K
$
, we randomly pick two cliques and swap two of
decreases the skew of that pair but keeps
their nodes so as to decrease the sum of skews of the two cliques. The swap is
the size of the cliques constant (see Algorithm~
\ref
{
Algorithm:D-Clique-Construction
}
).
chosen randomly among the ones which decrease the skew, hence
Only swaps that decrease the skew are performed, hence this algorithm can be
this algorithm can be seen as a form of randomized greedy algorithm.
seen as a form of randomized greedy algorithm. We note that this algorithm only requires
We note that this algorithm only requires
the knowledge of the label distribution at each node. For the sake of
the knowledge of the label distribution
$
p
_
i
(
y
)
$
at each node
$
i
$
. For the
sake of
simplicity, we assume that D-Cliques are constructed from the global
simplicity, we assume that D-Cliques are constructed from the global
knowledge of these distributions, which can easily be obtained by
knowledge of these distributions, which can easily be obtained by
decentralized averaging in a pre-processing step.
decentralized averaging in a pre-processing step.
\begin{algorithm}
[
h
]
\begin{algorithm}
[
t
]
\caption
{
D-Cliques Construction via Greedy Swap
}
\caption
{
D-Cliques Construction via Greedy Swap
}
\label
{
Algorithm:greedy-swap
}
\label
{
Algorithm:greedy-swap
}
\begin{algorithmic}
[1]
\begin{algorithmic}
[1]
\STATE
\textbf
{
Require:
}
C
lique size
$
M
$
,
M
ax steps
$
K
$
,
\STATE
\textbf
{
Require:
}
maximum c
lique size
$
M
$
,
m
ax steps
$
K
$
,
set
\STATE
Set
of all nodes
$
N
=
\{
1
,
2
,
\dots
, n
\}
$
,
of all nodes
$
N
=
\{
1
,
2
,
\dots
, n
\}
$
,
\STATE
$
\textit
{
skew
}
(
S
)
$
: skew of subset
$
S
\subseteq
N
$
compared to the global distribution (Eq.~
\ref
{
eq:skew
}
),
%
\STATE $\textit{skew}(S)$: skew of subset $S \subseteq N$ compared to the global distribution (Eq.~\ref{eq:skew}),
\STATE
$
\textit
{
intra
}
(
DC
)
$
: edges within cliques
$
C
\in
DC
$
,
%
\STATE $\textit{intra}(DC)$: edges within cliques $C \in DC$,
\STATE
$
\textit
{
inter
}
(
DC
)
$
: edges between
$
C
_
1
,C
_
2
\in
DC
$
(Sec.~
\ref
{
section:interclique-topologies
}
),
%
\STATE $\textit{inter}(DC)$: edges between $C_1,C_2 \in DC$ (Sec.~\ref{section:interclique-topologies}),
\STATE
$
\textit
{
weights
}
(
E
)
$
: set weights to edges in
$
E
$
(Eq.~
\ref
{
eq:metro
}
).
%
\STATE $\textit{weights}(E)$: set weights to edges in $E$ (Eq.~\ref{eq:metro}).
\STATE
~~
%
\STATE ~~
\STATE
$
DC
\leftarrow
[]
$
\COMMENT
{
Empty list
}
\STATE
$
DC
\leftarrow
[]
$
%
\COMMENT{Empty list}
\WHILE
{$
N
\neq
\emptyset
$}
\WHILE
{$
N
\neq
\emptyset
$}
\STATE
$
C
\leftarrow
$
sample
$
M
$
nodes from
$
N
$
at random
\STATE
$
C
\leftarrow
$
sample
$
M
$
nodes from
$
N
$
at random
\STATE
$
N
\leftarrow
N
\setminus
C
$
;
$
DC.append
(
C
)
$
\STATE
$
N
\leftarrow
N
\setminus
C
$
;
$
DC.
\text
{
append
}
(
C
)
$
\ENDWHILE
\ENDWHILE
\FOR
{$
k
\in
\{
1
,
\dots
, K
\}
$}
\FOR
{$
k
\in
\{
1
,
\dots
, K
\}
$}
\STATE
$
C
_
1
,C
_
2
\leftarrow
$
sample 2 from
$
DC
$
at random
\STATE
$
C
_
1
,C
_
2
\leftarrow
$
random sample of 2 elements from
$
DC
$
\STATE
$
s
\leftarrow
\textit
{
skew
}
(
C
_
1
)
+
skew
(
C
_
2
)
$
\STATE
$
\textit
{
swaps
}
\leftarrow
[]
$
\STATE
$
\textit
{
swaps
}
\leftarrow
[]
$
\FOR
{$
n
_
1
\in
C
_
1
,
n
_
2
\in
C
_
2
$}
\FOR
{$
i
\in
C
_
1
,
j
\in
C
_
2
$}
\STATE
$
s
\leftarrow
skew
(
C
_
1
)
+
skew
(
C
_
2
)
$
\STATE
$
s
'
\leftarrow
\textit
{
skew
}
(
C
_
1
\setminus\{
i
\}\cup\{
j
\}
)
\STATE
$
s'
\leftarrow
\textit
{
skew
}
(
C
_
1
-
n
_
1
+
n
_
2
)
+
\textit
{
skew
}
(
C
_
2
-
n
_
2
+
n
_
1
)
$
+
\textit
{
skew
}
(
C
_
2
\setminus\{
i
\}\cup\{
j
\}
)
$
\hspace*
{
-.05cm
}
\IF
{$
s' < s
$}
\IF
{$
s' < s
$}
\STATE
\textit
{
swaps
}
.append(
$
(
n
_
1
, n
_
2
)
$
)
\STATE
\textit
{
swaps
}
.append(
$
(
n
_
1
, n
_
2
)
$
)
\ENDIF
\ENDIF
\ENDFOR
\ENDFOR
\IF
{
\#\textit
{
swaps
}
$
>
0
$}
\IF
{
len(
\textit
{
swaps
}
)
$
>
0
$}
\STATE
$
(
n
_
1
,n
_
2
)
\leftarrow
$
sample 1 from
$
\textit
{
swaps
}$
at random
\STATE
$
(
n
_
1
,n
_
2
)
\leftarrow
$
random element from
$
\STATE
$
C
_
1
\leftarrow
C
_
1
-
n
_
1
+
n
_
2
; C
_
2
\leftarrow
C
_
2
-
n
_
2
+
n
1
$
\textit
{
swaps
}$
\STATE
$
C
_
1
\leftarrow
C
_
1
\setminus\{
j
\}\cup\{
i
\}
; C
_
2
\leftarrow
C
_
2
\setminus\{
j
\}\cup\{
i
\}
$
\ENDIF
\ENDIF
\ENDFOR
\ENDFOR
\RETURN
$
(
weights
(
\textit
{
intra
}
(
DC
)
\cup
\textit
{
inter
}
(
DC
))
, DC
)
$
\STATE
$
G
\leftarrow
$
graph composed of the cliques in
$
DC
$
\RETURN
$
G
$
\end{algorithmic}
\end{algorithmic}
\end{algorithm}
\end{algorithm}
...
@@ -191,17 +188,26 @@ decentralized averaging in a pre-processing step.
...
@@ -191,17 +188,26 @@ decentralized averaging in a pre-processing step.
% \end{algorithmic}
% \end{algorithmic}
% \end{algorithm}
% \end{algorithm}
The key idea of D-Cliques is that because the clique-level distribution
$
D
_
C
$
The key idea of D-Cliques is that because the clique-level label distribution
is representative of the global distribution
$
D
$
,
$
p
_
C
(
y
)
$
is representative of the global distribution
$
p
(
y
)
$
,
the local models of nodes across cliques remain rather close. Therefore, a
the local models of nodes across cliques remain rather close. Therefore, a
sparse inter-clique topology can be used, significantly reducing the total
sparse inter-clique topology can be used, significantly reducing the total
number of edges without slowing down the convergence. Furthermore, the degree
number of edges without slowing down the convergence. We discuss
of each node in the network remains low and even, making the D-Cliques
choices for this inter-clique topology in the next section.
topology very well-suited to decentralized federated learning.
\subsection
{
Adding Sparse Inter-Clique Connections
}
\subsection
{
Adding Sparse Inter-Clique Connections
}
\label
{
section:interclique-topologies
}
\label
{
section:interclique-topologies
}
\begin{figure}
[t]
\centering
\includegraphics
[width=0.20\textwidth]
{
../figures/fully-connected-cliques
}
\caption
{
\label
{
fig:d-cliques-figure
}
D-Cliques (fully-connected
cliques) example with 1 class/node.
}
\end{figure}
\todo
{
AB: if time, could add fig of another inter-clique topology (ring,
fractal or small-world)
}
Second, to ensure a global consensus and convergence,
Second, to ensure a global consensus and convergence,
\textit
{
inter-clique connections
}
\textit
{
inter-clique connections
}
are introduced by connecting a small number of node pairs that are
are introduced by connecting a small number of node pairs that are
...
@@ -249,6 +255,11 @@ cliques that are close on the ring, while still keeping the average
...
@@ -249,6 +255,11 @@ cliques that are close on the ring, while still keeping the average
path length small. This scheme uses
$
\frac
{
n
}{
c
}
*
2
(
m
)
\log
(
\frac
{
n
}{
c
}
)
$
inter-clique edges and
path length small. This scheme uses
$
\frac
{
n
}{
c
}
*
2
(
m
)
\log
(
\frac
{
n
}{
c
}
)
$
inter-clique edges and
therefore grows in the order of
$
O
(
n
\log
(
n
))
$
with the number of nodes.
therefore grows in the order of
$
O
(
n
\log
(
n
))
$
with the number of nodes.
Overall, D-Cliques ensures that the degree
of each node in the network remains low and balanced, making the topology
well-suited to
decentralized federated learning.
\subsection
{
Optimizing with Clique Averaging and Momentum
}
\subsection
{
Optimizing with Clique Averaging and Momentum
}
\label
{
section:clique-averaging-momentum
}
\label
{
section:clique-averaging-momentum
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment