Sketching the Journey of Learning

Bali's notes & demos.

Notes for Prof. Hung-Yi Lee’s ML Lecture 10: CNN | Sketching the Journey of Learning

Notes for Prof. Hung-Yi Lee's ML Lecture 10: CNN

August 15, 2021

CNN: Convolutional Neural Networks

When using fully connected networks on images, the number of parameters may be too large. By applying constraints on the networks based on properties of images, we can reduce the total number of parameters.

The fundamental concept of CNN is “to let filters convolve through the image”.

Input Images

The color images have 3 channels (RGB) on every pixel; the black-white have only 1 channel.

We have to transform the input image to the size of the input of the network in advance.

Scaling and rotation of the image training data impact the training results of CNN, i.e. The trained CNN is not invariant for scaling and rotation. However, a proper image recognizer should be able to generate the same output by input images with scaling or rotation. Therefore, we have to generate more training data by performing scaling and rotation on the training images, i.e. performing data augmentation.

Properties of Images & the CNN Techniques

When using the CNN techniques, make sure that your task meet the corresponding properties. For example AlphaGo don’t use pooling; NLP/Speech recognition only convolve through a single dimension (word sequence / frequency). In NLP, the vectors on the other dimension are not spatially related to the neighboring ones (word vector dimentions ); in speech recognition, the shift-invariant effect on the dimention of time is handled by recurrentness of the network.

Critical Patterns in Small Areas: Filters

In images, the critical patterns for recognition are usually in areas that are smaller than the whole image or even in small regions, so we use filters to find the patterns. The area that a filter processes is the receptive field.

filter

Critical Patterns at Different Regions: Weight Sharing

In images, the critical patterns may appear in different regions of the images, so the filters for the same pattern at different locations should share their weights.

weights sharing

Subsampling Don’t Impact the Recognition: Pooling

Subsampling of the image don’t change the object, so we can apply pooling to further reduce the total number of parameters.

subsampling

And example of pooling:

ReLu

Recently some networks don’t use pooling because the computing cost is not too high for their computing power.

CNN Structure

cnn structure

Total Number of Neurons & Connections for a Convolution Layer

Stride: the pixel distance between the neighboring receptive filds.

Padding: when using padding, if a receptive field at the edge exceeds the edge, then the exceeding part will be filled based on the padding method. Usually we use 0 padding, i.e. pad with 0’s.

Feature map: The output map produced by applying a filter over the input / pre-layer output.

convolution

Assume the shape of input and output are squares. Total number of weights of the convolution layer:

\[n_f (n_{rs})^2 n_{ch}\]

where $n_f$ is the total number of filters, $n_{rs}$ is the size of one side of the receptive field, $n_{ch}$ is the total number of channels of the input.

The total number of output neurons:

\[(\frac{n_{in} - n_{rs} + 1}{n_s} \text{ round up})^2 n_f\]

where $n_s$ is stride, $n_{in}$ is the size of a side of the input.

Gradient Descent on Input

To see the pattern that a neuron in a CNN learns to recognize, we can perform gradient descent on the input image to maximize the activation of the neuron.

By the example of digit recognition, we can also see that some non-sense input are going to be recognized as digits => deep neural networks are easily fooled (see link).

Similar technique can also be used for Deep Dream and Deep Style.

Deep Dream

Let an CNN show what it sees in an image.

Steps:

References

Youtube ML Lecture 10: Convolutional Neural Network

Course website