|
@@ -102,6 +102,8 @@ Let's compute the derivatives of all our models. Throughout the entire paper $n$
|
|
|
y &= \sigma(a^{(1)}w^{(2)} + b^{(2)})
|
|
y &= \sigma(a^{(1)}w^{(2)} + b^{(2)})
|
|
|
\end{align}
|
|
\end{align}
|
|
|
|
|
|
|
|
|
|
+The superscript in parenthesis denotes the current layer. For example $a_i^{(l)}$ denotes the activation from the $l$-th layer on $i$-th sample.
|
|
|
|
|
+
|
|
|
\subsubsection{Feed-Forward}
|
|
\subsubsection{Feed-Forward}
|
|
|
|
|
|
|
|
\begin{align}
|
|
\begin{align}
|