From 9e0a8bc41b0396d2fb7ede835a7af27e717cf5e8 Mon Sep 17 00:00:00 2001 From: Erick Lavoie <erick.lavoie@epfl.ch> Date: Mon, 29 Mar 2021 15:01:13 +0200 Subject: [PATCH] Fixed typo --- main.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.tex b/main.tex index 5fe6a94..1f0cc1a 100644 --- a/main.tex +++ b/main.tex @@ -353,7 +353,7 @@ We solve this problem by adding Clique Averaging to D-PSGD (Algorithm~\ref{Algor \caption{\label{fig:d-clique-mnist-clique-avg} Effect of Clique Averaging on MNIST. Y-axis starts at 89.} \end{figure} -As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, this significantly reduces variance between nodes and accelerates convergence speed. The convergence speed is now essentially identical to that obtained when fully connecting all nodes. The tradeoff is a higher messaging cost, double to that without clique averaging, and increased latency of a single training step by requiring two rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$. MNIST and a Linear model are relatively simple, so the next section shows to work with a harder dataset and a higher capacity model. +As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, this significantly reduces variance between nodes and accelerates convergence speed. The convergence speed is now essentially identical to that obtained when fully connecting all nodes. The tradeoff is a higher messaging cost, double to that without clique averaging, and increased latency of a single training step by requiring two rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$. MNIST and a Linear model are relatively simple, so the next section shows how to support a harder dataset and a higher capacity model. \section{Implementing Momentum with Clique Averaging} \label{section:momentum} -- GitLab