Convolutional Neural Networks

    3 minute read    

The post summarizes design principles of convolutional archietectures and landmark object recognition results. The layout follows my own learning steps: it firsts start with linear classifier, then moves to the full connected (FC) neural networks, later introduces the convolutional architectures that takes into account the special strucutre of image data, and finally touches on the state-of-art designs to overcome different problems.

Weekly Summary

In this week, I accomplished the following tasks:

  1. Fully understand convolutiona neural networks

a. CS231n notes.

b. ImageNet Classification with Deep Convolutional Neural Networks.

c. Deep Residual Learning for Image Recognition.

d. Identity Mappings in Deep Residual Networks.

  1. Implement ResNet and train a classifier on CIFAR-10.

Linear Classifier

This part is summarized in my previous post Linear Classifier.

Neural Networks: An Introduction.

This part is summarized in my previous post Neural Networks: An Introduction.

Convolutional Neural Networks


  • Convolutional Neural Networks are very similar to ordinary Neural Networks
    • Neurons have learnable weights and biases. Non-linearity of a dot product.
    • The whole network expresses a single differentiable score function.
    • Have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.
  • Difference
    • ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

ConvNet Architecture Overview

Regular Neural Nets simply treat images as vanilla vectors and don’t scale well to full images. Convolutional Neural Networks instead consider 3D volumes of neurons. Convolutional Neural Networks take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way. In particular, unlike a regular Neural Network, the layers of a ConvNet have neurons arranged in 3 dimensions: width, height, depth. (Note that the word depth here refers to the third dimension of an activation volume, not to the depth of a full Neural Network, which can refer to the total number of layers in a network.) Here is a visualization taken from [1]:

Left: A regular 3-layer Neural Network. Right: A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.

Major Improvements

Historical paper on Convolutional Neural Networks.

ImageNet Classification with Deep Convolutional Neural Networks.

ResNet improvement.

Identity Mappings in Deep Residual Networks

Big improvement, much deeper (> 100 layer) networks.

Deep Residual Learning for Image Recognition



The heart of the convnet is the the convolution layer which greatly improves the efficiency of neural networks for image inputs. Architecture wise, convnet is just a usual feed forward net, put on top of convolution layer(s). So really, convolution layer is a kind of feature extractor that can effectively learn the optimal features by considering the spatial strucutre of the image data.

Nowadays, to build a convnet model, it’s easy: install one of those popular Deep Learning libraries like TensorFlow, Torch, or Theano, and use the prebuilt module to rapidly build the model. However, to understand the convnet better, it’s essential to get our hands dirty. So, I will try implementing the conv layer from scratch using Numpy!

See my python notebooks.


See my python notebooks.


[1] Convolutional Neural Networks: Architectures, Convolution / Pooling Layers

[2] Understanding and Visualizing Convolutional Neural Networks

[3] Transfer Learning and Fine-tuning Convolutional Neural Networks

[4] ImageNet Classification with Deep Convolutional Neural Networks.

[5] Identity Mappings in Deep Residual Networks

[6] Deep Residual Learning for Image Recognition