Classical Convolutional Neural Network-LeNet-5

LeNet-5 (1998)

LeNet-5 is a convolutional neural network algorithm proposed by Yann LeCun in 1998, originally used to solve the problem of handwriting recognition.

LeNet-5 network structure

The network structure of LeNet-5 is shown in the figure below:

The LeNet-5 network consists of 7 layers. Each layer has multiple feature maps (Feature Map), and each feature map extracts a feature of the input through a convolution filter. Each feature map has multiple neurons.

Input: Input as 32*32 grayscale image. That is, the input is a 2-dimensional matrix.

Layer1 convolutional layer: 6 convolution kernels with a size of 5*5, with a step size of 1. So the output is 28*28*6. The number of parameters that need to be trained is: 5*5*6+6=156, the size of each convolution kernel is 5*5, there are 6 convolution kernels in total, and 6 offset items are added at the end.

Layer2 pooling layer (downsampling layer): 2*2 pooling layer, using the average pooling method, with a step size of 2. So the output is 14*14*6. The sampling method is to add 4 inputs, multiply by a trainable parameter, and add a trainable bias term, and the result is through the sigmoid function (different from the usual average pooling). The number of parameters to be trained is: 2*6=12.

Layer3 convolutional layer: 16 convolution kernels with a size of 5*5, with a step size of 1. So the output is 10*10*16. In this layer, the input is the map combination of all 6 or several features in Layer 2. In this layer, the first 6 feature maps take 3 adjacent feature map subsets in Layer 2 as input, and the next 6 feature maps take a subset of 4 adjacent feature maps in Layer 2 as input. The next three take 4 non-adjacent feature map subsets as input, and the last one takes all feature maps in Layer 2 as input. The following figure illustrates the process of obtaining 16 feature maps of Layer 3 from the 6 feature maps of Layer 2:

In the figure, the leftmost 0-5 are the 6 feature maps of Layer 2, and the top 0-15 are the 6 feature maps of Layer 3. The first 6 feature maps (0-5) of Layer3 are connected to the 3 feature maps of Layer2. The next 6 feature maps are connected with the 4 feature maps connected in Layer 2, the next 3 feature maps are connected with 4 unconnected features in Layer 2, and the last one is connected with all features in Layer 2. The output image size is still 10*10.

Therefore, the number of parameters to be trained is: 6*(3*5*5+1)+6*(4*5*5+1)+3*(4*5*5 +1)+1*(6*5*5+1)=1516

Layer4 pooling layer (downsampling layer): 2*2 pooling layer, using the average pooling method, with a step size of 2. So the output is 5*5*16. The sampling method is to add 4 inputs, multiply by a trainable parameter, and add a trainable bias term, and the result is through the sigmoid function (different from the usual average pooling). The number of parameters that need to be trained is: 2*16=32.

Layer5 convolutional layer (the input layer of the fully connected layer): Input the 16 unit features of Layer4, the size of the convolution kernel is 5*5, and the number of convolution kernels is 120. The output is 1*1*120. That is, 120 convolution results. The number of parameters to be trained is (5*5*16+1)*120=48120. This layer will serve as the input layer of the fully connected layer.

Layer6 fully connected layer: This layer has 84 neurons. Calculate the dot product between the input vector and the weight vector, add an offset, and output the result through the sigmoid function. The number of trainable parameters is (120+1)*84=10164.

Layer7softmax classification layer: For number recognition, this layer has 10 nodes, representing the numbers 0 to 9. The number of parameters in this layer is 84*(10+1)=850. This layer uses a radial basis function (RBF) network connection method. Assuming that x is the input of the previous layer and y is the output of RBF, then the RBF output is calculated as:

The value of i in the formula is 0 to 9, and the value of j is 0 to 83 (84-1, because Layer 6 has 84 nodes). The closer the RBF value is to 0, the closer it is to the recognized result number.

The following figure shows the process of LeNet-5 identifying the number 3:

0

Leave a Reply

Your email address will not be published.