Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
D-Cliques
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
SaCS
Distributed Machine Learning
D-Cliques
Commits
43cd9379
Commit
43cd9379
authored
3 years ago
by
aurelien.bellet
Browse files
Options
Downloads
Patches
Plain Diff
AMK fixes
parent
2be9a4e9
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
main.tex
+8
-6
8 additions, 6 deletions
main.tex
with
8 additions
and
6 deletions
main.tex
+
8
−
6
View file @
43cd9379
...
@@ -348,7 +348,7 @@ CIFAR10~\cite{krizhevsky2009learning}, which both have $c=10$ classes.
...
@@ -348,7 +348,7 @@ CIFAR10~\cite{krizhevsky2009learning}, which both have $c=10$ classes.
For MNIST, we use 45k and 10k examples from the original 60k
For MNIST, we use 45k and 10k examples from the original 60k
training set for training and validation respectively. The remaining 5k
training set for training and validation respectively. The remaining 5k
training examples were randomly removed to ensure all 10 classes are balanced
training examples were randomly removed to ensure all 10 classes are balanced
while ensuring the dataset is evenly divisible across 100 and 1000 nodes.
while ensuring
that
the dataset is evenly divisible across 100 and 1000 nodes.
We use all 10k examples of
We use all 10k examples of
the test set to measure prediction accuracy. For CIFAR10, classes are evenly
the test set to measure prediction accuracy. For CIFAR10, classes are evenly
balanced: we use 45k/50k images of the original training set for training,
balanced: we use 45k/50k images of the original training set for training,
...
@@ -381,7 +381,8 @@ been sampled by a node, i.e. an \textit{epoch}. This is equivalent to the classi
...
@@ -381,7 +381,8 @@ been sampled by a node, i.e. an \textit{epoch}. This is equivalent to the classi
To further make results comparable across different number of nodes, we lower
To further make results comparable across different number of nodes, we lower
the batch size proportionally to the number of nodes added, and inversely,
the batch size proportionally to the number of nodes added, and inversely,
e.g. on MNIST, 128 with 100 nodes vs. 13 with 1000 nodes. This
e.g. on MNIST, 128 with 100 nodes vs. 13 with 1000 nodes. This
ensures the same number of model updates and averaging per epoch, which is
ensures that the number of model updates and averaging per epoch remains the
same, which is
important to have a fair comparison.
\footnote
{
Updating and averaging models
important to have a fair comparison.
\footnote
{
Updating and averaging models
after every example can eliminate the impact of local class bias. However, the
after every example can eliminate the impact of local class bias. However, the
resulting communication overhead is impractical.
}
resulting communication overhead is impractical.
}
...
@@ -543,9 +544,10 @@ introduced by inter-clique edges. We address this issue in the next section.
...
@@ -543,9 +544,10 @@ introduced by inter-clique edges. We address this issue in the next section.
\section
{
Optimizing with Clique Averaging and Momentum
}
\section
{
Optimizing with Clique Averaging and Momentum
}
\label
{
section:clique-averaging-momentum
}
\label
{
section:clique-averaging-momentum
}
In this sectio, we present Clique Averaging, a simple modification of D-SGD
In this section, we present Clique Averaging, a feature that we add to
which removes the bias caused by the inter-cliques edges of
D-SGD in order to remove the bias caused by the inter-cliques edges of
D-Cliques, and show how this can be used to successfully implement momentum
D-Cliques. We then show how this can be used to successfully implement
momentum
for non-IID data.
for non-IID data.
%AMK: check
%AMK: check
...
@@ -802,7 +804,7 @@ average shortest path to $2$ between any pair of nodes. This choice requires $
...
@@ -802,7 +804,7 @@ average shortest path to $2$ between any pair of nodes. This choice requires $
in the number of nodes. This can become significant at larger scales when
$
n
$
is
in the number of nodes. This can become significant at larger scales when
$
n
$
is
large compared to
$
c
$
.
large compared to
$
c
$
.
In this last series of experiment, we evaluate the effect of choosing sparser
In this last series of experiment
s
, we evaluate the effect of choosing sparser
inter-clique topologies on the convergence speed for a larger network of 1000
inter-clique topologies on the convergence speed for a larger network of 1000
nodes. We compare the scalability and convergence speed of several
nodes. We compare the scalability and convergence speed of several
D-Cliques variants, which all use
$
O
(
nc
)
$
edges
D-Cliques variants, which all use
$
O
(
nc
)
$
edges
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment