diff --git a/main.tex b/main.tex
index 5fe6a94f4d84d453ab2afc208deb6c06a1041535..1f0cc1a1ae2147100d851745a471955fb1cd151d 100644
--- a/main.tex
+++ b/main.tex
@@ -353,7 +353,7 @@ We solve this problem by adding Clique Averaging to D-PSGD (Algorithm~\ref{Algor
 \caption{\label{fig:d-clique-mnist-clique-avg} Effect of Clique Averaging on MNIST. Y-axis starts at 89.}
 \end{figure}
 
-As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, this significantly reduces variance between nodes and accelerates convergence speed. The convergence speed is now essentially identical to that obtained when fully connecting all nodes. The tradeoff is a higher messaging cost, double to that without clique averaging, and increased latency of a single training step by requiring two rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$. MNIST and a Linear model are relatively simple, so the next section shows to work with a harder dataset and a higher capacity model.
+As illustrated in Figure~\ref{fig:d-clique-mnist-clique-avg}, this significantly reduces variance between nodes and accelerates convergence speed. The convergence speed is now essentially identical to that obtained when fully connecting all nodes. The tradeoff is a higher messaging cost, double to that without clique averaging, and increased latency of a single training step by requiring two rounds of messages. Nonetheless, compared to fully connecting all nodes, the total number of messages is reduced by $\approx 80\%$. MNIST and a Linear model are relatively simple, so the next section shows how to support a harder dataset and a higher capacity model.
 
 \section{Implementing Momentum with Clique Averaging}
 \label{section:momentum}