Reviews

  • 1
  • 2
  • 3
  • 4
  • 5
Editor Rating
  • 1
  • 2
  • 3
  • 4
  • 5
User Ratings
Based on 0 reviews

Major Concepts

Articles Home » Predictive and Prescriptive Modeling » Machine Learning » Introduction to Convolutional Neural Networks

Introduction to Convolutional Neural Networks

Convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, which is most commonly used to analyze visual imagery.


A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other. 


source:https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/


The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics


In the figure, we have an RGB image that has been separated by its three color planes — Red, Green, and Blue. There are a number of such color spaces in which images exist — Grayscale, RGB, HSV, CMYK, etc.



source:https://towardsdatascience.com/understanding-images-with-skimage-python-b94d210afd23



Source: https://dev.to/sandeepbalachandran/machine-learning-going-furthur-with-cnn-part-2-41km


You can imagine how computationally intensive things would get once the images reach dimensions, say 8K (7680×4320). The role of the ConvNet is to reduce the images into a form that is easier to process, without losing features which are critical for getting a good prediction. This is important when we are to design an architecture that is not only good at learning features but also is scalable to massive datasets.


A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image through the application of relevant filters. 


Filter:


Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernel.


Consider a 5 x 5pixel image and filter matrix 3 x 3 as shown in below



source: https://mc.ai/convolution-operation-comprehensive-guide/


Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix which is called “Feature Map” as output shown in below

source:https://icecreamlabs.com/2018/08/19/3x3-convolution-filters%E2%80%8A-%E2%80%8Aa-popular-choice/


Convolution of an image with different filters can perform operations such as edge detection, blur, and sharpen by applying filters. The below example shows various convolution image after applying different types of filters (Kernels).





Source: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/


Strides


Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. The below figure shows convolution would work with a stride of 2.


source:https://deepai.org/machine-learning-glossary-and-terms/stride


Padding


Sometimes filter does not fit perfectly fit the input image. We have two options:Pad the picture with zeros (zero-padding) so that it fits. Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.





source:https://miro.medium.com/max/325/1*b77nZmPH15dE8g49BLW20A.png


Pooling Layer


The pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling which reduces the dimensionality of each map but retains important information. Spatial Pooling can be of different types:



  • Max Pooling

  • Average Pooling

  • Sum Pooling


Max pooling takes the largest element from the rectified feature map. Taking the largest element could also take the average pooling. Sum of all elements in the feature map call as sum pooling.


source:https://cs231n.github.io/assets/cnn/maxpool.jpeg


Fully Connected Layer


The Fully Connected layer is a traditional Multi Layer Perceptron that uses a softmax activation function in the output layer (other classifiers like SVM can also be used, but will stick to softmax in this post). The term “Fully Connected” implies that every neuron in the previous layer is connected to every neuron on the next layer.


The output from the convolutional and pooling layers represent high-level features of the input image. The purpose of the Fully Connected layer is to use these features for classifying the input image into various classes based on the training dataset


Putting it all together, the Convolution + Pooling layers act as Feature Extractors from the input image while Fully Connected layer acts as a classifier.






source:https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/


User Reviews