The post summarizes design principles of convolutional archietectures and landmark object recognition results. The layout follows my own learning steps: it firsts start with linear classifier, then moves to the full connected (FC) neural networks, later introduces the convolutional architectures that takes into account the special strucutre of image data, and finally touches on the state-of-art designs to overcome different problems.
- Weekly Summary
- Linear Classifier
- Neural Networks: An Introduction.
- Convolutional Neural Networks
- ConvNet Architecture Overview
- Major Improvements
In this week, I accomplished the following tasks:
- Fully understand convolutiona neural networks
a. CS231n notes.
- Implement ResNet and train a classifier on CIFAR-10.
This part is summarized in my previous post Linear Classifier.
Neural Networks: An Introduction.
This part is summarized in my previous post Neural Networks: An Introduction.
Convolutional Neural Networks
- Convolutional Neural Networks are very similar to ordinary Neural Networks
- Neurons have learnable weights and biases. Non-linearity of a dot product.
- The whole network expresses a single differentiable score function.
- Have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.
- ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.
ConvNet Architecture Overview
Regular Neural Nets simply treat images as vanilla vectors and don’t scale well to full images. Convolutional Neural Networks instead consider 3D volumes of neurons. Convolutional Neural Networks take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way. In particular, unlike a regular Neural Network, the layers of a ConvNet have neurons arranged in 3 dimensions: width, height, depth. (Note that the word depth here refers to the third dimension of an activation volume, not to the depth of a full Neural Network, which can refer to the total number of layers in a network.) Here is a visualization taken from :
A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.
Historical paper on Convolutional Neural Networks.
Big improvement, much deeper (> 100 layer) networks.
The heart of the convnet is the the convolution layer which greatly improves the efficiency of neural networks for image inputs. Architecture wise, convnet is just a usual feed forward net, put on top of convolution layer(s). So really, convolution layer is a kind of feature extractor that can effectively learn the optimal features by considering the spatial strucutre of the image data.
Nowadays, to build a convnet model, it’s easy: install one of those popular Deep Learning libraries like TensorFlow, Torch, or Theano, and use the prebuilt module to rapidly build the model. However, to understand the convnet better, it’s essential to get our hands dirty. So, I will try implementing the conv layer from scratch using Numpy!
See my python notebooks.
See my python notebooks.