Skip to content
Snippets Groups Projects
Commit 6daebf4a authored by aurelien.bellet's avatar aurelien.bellet
Browse files

define points as x,y instead of s

parent ce2a41ab
No related branches found
No related tags found
No related merge requests found
......@@ -177,9 +177,10 @@ averaging step as in the original version.
rate $\gamma$, mixing weights $W$, mini-batch size $m$, number of
steps $K$
\FOR{$k = 1,\ldots, K$}
\STATE $s_i^{(k)} \gets \text{mini-batch sample of size $m$ drawn
\STATE $S_i^{(k)} \gets \text{mini-batch of $m$ samples drawn
from~} D_i$
\STATE $g_i^{(k)} \gets \frac{1}{|\textit{Clique}(i)|}\sum_{j \in \textit{Clique(i)}} \nabla F(\theta_j^{(k-1)}; s_j^{(k)})$
\STATE $g_i^{(k)} \gets \frac{1}{|\textit{Clique}(i)|}\sum_{j \in
\textit{Clique(i)}} \nabla F(\theta_j^{(k-1)}; S_j^{(k)})$
\STATE $\theta_i^{(k-\frac{1}{2})} \gets \theta_i^{(k-1)} - \gamma g_i^{(k)}$
\STATE $\theta_i^{(k)} \gets \sum_{j \in N} W_{ji}^{(k)} \theta_j^{(k-\frac{1}{2})}$
\ENDFOR
......
......@@ -5,7 +5,12 @@
\label{section:problem}
We consider a set $N = \{1, \dots, n \}$ of $n$ nodes seeking to
collaboratively solve a classification task with $L$ classes. Each node has access to a local dataset that
collaboratively solve a classification task with $c$ classes. We denote a
labeled data point by a tuple $(x,y)$ where $x$ represents the data point
(e.g., a feature vector) and $y\in\{1,\dots,c\}$ its label.
Each
node has
access to a local dataset that
follows its own local distribution $D_i$. The goal is to find the parameters
$\theta$ of a global model that performs well on the union of the local
distributions by
......@@ -13,13 +18,16 @@ collaboratively solve a classification task with $L$ classes. Each node has acce
the average training loss:
\begin{equation}
\min_{\theta} \frac{1}{n}\sum_{i=1}^{n} \mathds{E}_
{s_i \sim D_i} [F_i(\theta;s_i)],
{(x_i,y_i) \sim D_i} [F_i(\theta;x_i,y_i)],
\label{eq:dist-optimization-problem}
\end{equation}
where $s_i$ is a data example drawn from $D_i$ and $F_i$ is the loss function
on node $i$. Therefore, $\mathds{E}_{s_i \sim D_i} F_i(\theta;s_i)$ denotes
where $(x_i,y_i)$ is a data point drawn from $D_i$ and $F_i$ is the loss
function
on node $i$. Therefore, $\mathds{E}_{(x_i,y_i) \sim D_i} F_i(\theta;x_i,y_i)$
denotes
the
expected loss of model $x$ on a random example $s_i$ drawn from $D_i$.
expected loss of model $\theta$ over the local data distribution
$D_i$.
To collaboratively solve Problem \eqref{eq:dist-optimization-problem}, each
node can exchange messages with its neighbors in an undirected network graph
......@@ -31,7 +39,7 @@ between nodes $i$ and $j$.
In this work, we use the popular Decentralized Stochastic
Gradient Descent algorithm, aka D-SGD~\cite{lian2017d-psgd}. As
shown in Algorithm~\ref{Algorithm:D-PSGD},
a single iteration of D-SGD at node $i$ consists of sampling a mini-batch
a single iteration of D-SGD at node $i$ consists in sampling a mini-batch
from its local distribution
$D_i$, updating its local model $\theta_i$ by taking a stochastic gradient
descent
......@@ -71,9 +79,10 @@ topology $G$, namely:\todo{AB: if we need space we can remove this equation}
learning rate $\gamma$, mixing weights $W$, mini-batch size $m$,
number of steps $K$
\FOR{$k = 1,\ldots, K$}
\STATE $s_i^{(k)} \gets \text{mini-batch sample of size $m$ drawn
\STATE $S_i^{(k)} \gets \text{mini-batch of $m$ samples drawn
from~} D_i$
\STATE $\theta_i^{(k-\frac{1}{2})} \gets \theta_i^{(k-1)} - \gamma \nabla F(\theta_i^{(k-1)}; s_i^{(k)})$
\STATE $\theta_i^{(k-\frac{1}{2})} \gets \theta_i^{(k-1)} - \gamma
\nabla F(\theta_i^{(k-1)}; S_i^{(k)})$
\STATE $\theta_i^{(k)} \gets \sum_{j \in N} W_{ji}^{(k)} \theta_j^{(k-\frac{1}{2})}$
\ENDFOR
\end{algorithmic}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment