@@ -102,6 +102,8 @@ Let's compute the derivatives of all our models. Throughout the entire paper $n$
y &= \sigma(a^{(1)}w^{(2)} + b^{(2)})
\end{align}
+The superscript in parenthesis denotes the current layer. For example $a_i^{(l)}$ denotes the activation from the $l$-th layer on $i$-th sample.
+
\subsubsection{Feed-Forward}
\begin{align}