Skip to content
Snippets Groups Projects
Commit 9e0a8bc4 authored by Erick Lavoie's avatar Erick Lavoie
Browse files

Fixed typo

parent 4438c0d7
No related branches found
No related tags found
No related merge requests found
......@@ -353,7 +353,7 @@ We solve this problem by adding Clique Averaging to D-PSGD (Algorithm~\ref{Algor
\caption{\label{fig:d-clique-mnist-clique-avg} Effect of Clique Averaging on MNIST. Y-axis starts at 89.}
\end{figure}
As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, this significantly reduces variance between nodes and accelerates convergence speed. The convergence speed is now essentially identical to that obtained when fully connecting all nodes. The tradeoff is a higher messaging cost, double to that without clique averaging, and increased latency of a single training step by requiring two rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$. MNIST and a Linear model are relatively simple, so the next section shows to work with a harder dataset and a higher capacity model.
As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, this significantly reduces variance between nodes and accelerates convergence speed. The convergence speed is now essentially identical to that obtained when fully connecting all nodes. The tradeoff is a higher messaging cost, double to that without clique averaging, and increased latency of a single training step by requiring two rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$. MNIST and a Linear model are relatively simple, so the next section shows how to support a harder dataset and a higher capacity model.
\section{Implementing Momentum with Clique Averaging}
\label{section:momentum}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment