Back Propagation in Convolution Layer
January 7, 2018
Consider a valid convolution [1] between an input feature map,
$X$ and a filter (synonymously kernel or weights), $W$ to produce
an output feature map, $Y$.
\begin{align}
\begin{bmatrix}
w_1 & w_2 \\
w_3 & w_4 \\
\end{bmatrix}
\circledast
\begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
x_5 & x_6 & x_7 & x_8 \\
x_9 & x_{10} & x_{11} & x_{12} \\
x_{13} & x_{14} & x_{15} & x_{16} \\
\end{bmatrix} &=
\begin{bmatrix}
y_1 & y_2 & y_3 \\
y_4 & y_5 & y_6 \\
y_7 & y_8 & y_9 \\
\end{bmatrix} \\
\\
W \circledast X &= Y \\
\end{align}
During the forward propagation, the outputs are given by,
\begin{align}
y_1 &= w_1x_1 + w_2x_2 + w_3x_5 + w_4x_6 \\
y_2 &= w_1x_2 + w_2x_3 + w_3x_6 + w_4x_7 \\
y_3 &= w_1x_3 + w_2x_4 + w_3x_7 + w_4x_8 \\
y_4 &= w_1x_5 + w_2x_6 + w_3x_9 + w_4x_{10} \\
y_5 &= w_1x_6 + w_2x_7 + w_3x_{10} + w_4x_{11} \\
y_6 &= w_1x_7 + w_2x_8 + w_3x_{11} + w_4x_{12} \\
y_7 &= w_1x_9 + w_2x_{10} + w_3x_{13} + w_4x_{14} \\
y_8 &= w_1x_{10} + w_2x_{11} + w_3x_{14} + w_4x_{15} \\
y_9 &= w_1x_{11} + w_2x_{12} + w_3x_{15} + w_4x_{16} \\
\end{align}
Let $L$ be the loss (synonymously error or cost) of the network we
want to minimize. During backward propagation, given $\frac{\partial
L}{\partial Y}$ we calculate $\frac{\partial L}{\partial W}$ i.e.
the weight gradient, $\frac{\partial L}{\partial X}$ i.e. the input
gradient. The weight gradient is used to adjust (learn) the values of
the weight matrix, while the input gradient is propagated backwards
through the network.
Weight gradient
Using the equation above,
\begin{align}
\frac{\partial L}{\partial W}
&=
\begin{bmatrix}
\frac{\partial L}{\partial w_1} & \frac{\partial L}{\partial w_2} \\
\frac{\partial L}{\partial w_3} & \frac{\partial L}{\partial w_4} \\
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_1} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_1} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_1} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_1} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial w_1} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial w_1} +
\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial w_1} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial w_1} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial w_1})
&
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_2} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_2} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_2} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_2} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial w_2} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial w_2} +
\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial w_2} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial w_2} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial w_2}) \\
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_3} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_3} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_3} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_3} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial w_3} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial w_3} +
\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial w_3} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial w_3} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial w_3})
&
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_4} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_4} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_4} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_4} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial w_4} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial w_4} +
\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial w_4} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial w_4} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial w_4}) \\
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1} x_1 +
\frac{\partial L}{\partial y_2} x_2 +
\frac{\partial L}{\partial y_3} x_3 +
\frac{\partial L}{\partial y_4} x_5 +
\frac{\partial L}{\partial y_5} x_6 +
\frac{\partial L}{\partial y_6} x_7 +
\frac{\partial L}{\partial y_7} x_9 +
\frac{\partial L}{\partial y_8} x_{10} +
\frac{\partial L}{\partial y_9} x_{11})
&
(\frac{\partial L}{\partial y_1} x_2 +
\frac{\partial L}{\partial y_2} x_3 +
\frac{\partial L}{\partial y_3} x_4 +
\frac{\partial L}{\partial y_4} x_6 +
\frac{\partial L}{\partial y_5} x_7 +
\frac{\partial L}{\partial y_6} x_8 +
\frac{\partial L}{\partial y_7} x_{10} +
\frac{\partial L}{\partial y_8} x_{11} +
\frac{\partial L}{\partial y_9} x_{12}) \\
(\frac{\partial L}{\partial y_1} x_5 +
\frac{\partial L}{\partial y_2} x_6 +
\frac{\partial L}{\partial y_3} x_7 +
\frac{\partial L}{\partial y_4} x_9 +
\frac{\partial L}{\partial y_5} x_{10} +
\frac{\partial L}{\partial y_6} x_{11} +
\frac{\partial L}{\partial y_7} x_{13} +
\frac{\partial L}{\partial y_8} x_{14} +
\frac{\partial L}{\partial y_9} x_{15})
&
(\frac{\partial L}{\partial y_1} x_6 +
\frac{\partial L}{\partial y_2} x_7 +
\frac{\partial L}{\partial y_3} x_8 +
\frac{\partial L}{\partial y_4} x_{10} +
\frac{\partial L}{\partial y_5} x_{11} +
\frac{\partial L}{\partial y_6} x_{12} +
\frac{\partial L}{\partial y_7} x_{14} +
\frac{\partial L}{\partial y_8} x_{15} +
\frac{\partial L}{\partial y_9} x_{16}) \\
\end{bmatrix} \\
&=
\begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
x_5 & x_6 & x_7 & x_8 \\
x_9 & x_{10} & x_{11} & x_{12} \\
x_{13} & x_{14} & x_{15} & x_{16} \\
\end{bmatrix}
\circledast
\begin{bmatrix}
\frac{\partial L}{\partial y_1} & \frac{\partial L}{\partial y_2} & \frac{\partial L}{\partial y_3} \\
\frac{\partial L}{\partial y_4} & \frac{\partial L}{\partial y_5} & \frac{\partial L}{\partial y_6} \\
\frac{\partial L}{\partial y_7} & \frac{\partial L}{\partial y_8} & \frac{\partial L}{\partial y_9} \\
\end{bmatrix} \\
\end{align}
Therefore, $\frac{\partial L}{\partial W}$ is equal to a valid convolution between $X$ and the output gradient, $\frac{\partial L}{\partial Y}$ i.e.
\begin{align}
\frac{\partial L}{\partial W} = X \circledast \frac{\partial L}{\partial Y}
\end{align}
Input gradient
Using the equation above,
\begin{align}
\frac{\partial L}{\partial X}
&=
\begin{bmatrix}
\frac{\partial L}{\partial x_1} & \frac{\partial L}{\partial x_2} & \frac{\partial L}{\partial x_3} & \frac{\partial L}{\partial x_4} \\
\frac{\partial L}{\partial x_5} & \frac{\partial L}{\partial x_6} & \frac{\partial L}{\partial x_7} & \frac{\partial L}{\partial x_8} \\
\frac{\partial L}{\partial x_9} & \frac{\partial L}{\partial x_{10}} & \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}} \\
\frac{\partial L}{\partial x_{13}} & \frac{\partial L}{\partial x_{14}} & \frac{\partial L}{\partial x_{15}} & \frac{\partial L}{\partial x_{16}} \\
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_1}) &
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_2} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_2}) &
(\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_3} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_3}) &
(\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_4}) \\
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_5} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_5}) &
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_6} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_6} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_6} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial x_6}) &
(\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_7} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_7} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial x_7} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial x_7}) &
(\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_8} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial x_8}) \\
(\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_9} +
\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial x_9}) &
(\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{10}} +
\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial x_{10}} +
\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial x_{10}} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial x_{10}}) &
(\frac{\partial L}{\partial y_5}\frac{\partial y_5}{\partial x_{11}} +
\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial x_{11}} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial x_{11}} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial x_{11}}) &
(\frac{\partial L}{\partial y_6}\frac{\partial y_6}{\partial x_{12}} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial x_{12}}) \\
(\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial x_{13}}) &
(\frac{\partial L}{\partial y_7}\frac{\partial y_7}{\partial x_{14}} +
\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial x_{14}}) &
(\frac{\partial L}{\partial y_8}\frac{\partial y_8}{\partial x_{15}} +
\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial x_{15}}) &
(\frac{\partial L}{\partial y_9}\frac{\partial y_9}{\partial x_{16}}) \\
\end{bmatrix} \\
\\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1} w_1) &
(\frac{\partial L}{\partial y_1} w_2 +
\frac{\partial L}{\partial y_2} w_1) &
(\frac{\partial L}{\partial y_2} w_2 +
\frac{\partial L}{\partial y_3} w_1) &
(\frac{\partial L}{\partial y_3} w_2) \\
(\frac{\partial L}{\partial y_1} w_3 +
\frac{\partial L}{\partial y_4} w_1) &
(\frac{\partial L}{\partial y_1} w_4 +
\frac{\partial L}{\partial y_2} w_3 +
\frac{\partial L}{\partial y_4} w_2 +
\frac{\partial L}{\partial y_5} w_1) &
(\frac{\partial L}{\partial y_2} w_4 +
\frac{\partial L}{\partial y_3} w_3 +
\frac{\partial L}{\partial y_5} w_2 +
\frac{\partial L}{\partial y_6} w_1) &
(\frac{\partial L}{\partial y_3} w_4 +
\frac{\partial L}{\partial y_6} w_2) \\
(\frac{\partial L}{\partial y_4} w_3 +
\frac{\partial L}{\partial y_7} w_1) &
(\frac{\partial L}{\partial y_4} w_4 +
\frac{\partial L}{\partial y_5} w_3 +
\frac{\partial L}{\partial y_7} w_2 +
\frac{\partial L}{\partial y_8} w_1) &
(\frac{\partial L}{\partial y_5} w_4 +
\frac{\partial L}{\partial y_6} w_3 +
\frac{\partial L}{\partial y_8} w_2 +
\frac{\partial L}{\partial y_9} w_1) &
(\frac{\partial L}{\partial y_6} w_4 +
\frac{\partial L}{\partial y_9} w_2) \\
(\frac{\partial L}{\partial y_7} w_3) &
(\frac{\partial L}{\partial y_7} w_4 +
\frac{\partial L}{\partial y_8} w_3) &
(\frac{\partial L}{\partial y_8} w_4 +
\frac{\partial L}{\partial y_9} w_3) &
(\frac{\partial L}{\partial y_9} w_4) \\
\end{bmatrix}\\
&=
\begin{bmatrix}
w_4 & w_3 \\
w_2 & w_1 \\
\end{bmatrix}
\circledast
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 \\
0 & \frac{\partial L}{\partial y_1} & \frac{\partial L}{\partial y_2} & \frac{\partial L}{\partial y_3} & 0 \\
0 & \frac{\partial L}{\partial y_4} & \frac{\partial L}{\partial y_5} & \frac{\partial L}{\partial y_6} & 0 \\
0 & \frac{\partial L}{\partial y_7} & \frac{\partial L}{\partial y_8} & \frac{\partial L}{\partial y_9} & 0 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix} \\
\end{align}
Note that,
\begin{align}
\begin{bmatrix}
w_4 & w_3 \\
w_2 & w_1 \\
\end{bmatrix}
&=
\begin{bmatrix}
0 & 1 \\
1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
w_1 & w_2 \\
w_3 & w_4 \\
\end{bmatrix}
\begin{bmatrix}
0 & 1 \\
1 & 0 \\
\end{bmatrix} \\
&= J_2 W J_2 \\
\end{align}
where, $J_2$ is an exchange matrix [2]. The first will reverse
the rows of the matrix and the second will reverse the columns of the
matrix. The second matrix is $\frac{\partial L}{\partial
Y}$ appropriately padded with zeros to prepare it for a full
convolution [1]. Therefore, $\frac{\partial L}{\partial
X}$ is equal to a full convolution between the row and column
reversed $W$ and $\frac{\partial L}{\partial Y}$ i.e.
\begin{align}
\frac{\partial L}{\partial X} = (J_2 W J_2) \circledast \frac{\partial L}{\partial Y}
\end{align}
References
- Valid, Same and Full Convolution.
- Exchange matrix.