How is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB)
After doing some reading on the architecture/implementation of a CNN... moreHow is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB)
After doing some reading on the architecture/implementation of a CNN I understand that each neuron in a feature map references NxM pixels of an image as defined by the kernel size. Each pixel is then factored by the feature maps learned NxM weight set (the kernel/filter), summed, and input into an activation function. For a simple grey scale image, I imagine the operation would be something adhere to the following pseudo code:
for i in range(0, image_width-kernel_width+1):
for j in range(0, image_height-kernel_height+1):
for x in range(0, kernel_width):
for y in range(0, kernel_height):
sum += kernel * image
feature_map = act_func(sum)
sum = 0.0
However I don't understand how to extend this model to handle multiple channels. Are three separate weight sets required per feature map, shared between each colour?
Referencing this tutorial's... less