Lesson 4: Convolutional Neural Networks (CNNs)

Standard fully connected networks fail when processing high-dimensional data like images because they ignore spatial relationships. **Convolutional Neural Networks (CNNs)** solve this by looking at local pixel blocks.

The Convolutional Layer

The core of a CNN is the **Convolutional Layer**. Instead of connecting every pixel to every neuron, a sliding matrix called a **Kernel** (or filter) traverses the image. As it slides, it performs dot products on local region matrices, highlighting spatial features like edges, textures, or shapes.

Kernel Properties: Stride and Padding

Kernel Size: The spatial dimensions of the filter (commonly 3x3 or 5x5).
Stride: The step size of the kernel as it slides across the image. A stride of 1 moves the kernel one pixel at a time; a stride of 2 skips one pixel, shrinking the output dimensions.
Padding: Adding border pixels (usually zeros) around the image to allow the kernel to inspect edge pixels without shrinking the output image size.

Downsampling with Pooling

**Pooling** layers reduce the spatial size of representation matrices to cut down parameters and computational load. The most popular method is **Max Pooling**, which slides a filter across the output and selects the maximum value from each block, keeping only the most dominant features.

Exercise: Convolutional Math

Calculate the output spatial size (O) for a convolution operation with the following settings:

Input size (I): 32 x 32

Kernel size (K): 3 x 3

Stride (S): 1

Padding (P): 0

Formula: O = ((I - K + 2*P) / S) + 1

[ ]32 x 32
[x]30 x 30 (since (32 - 3 + 2*0)/1 + 1 = 30)
[ ]28 x 28

Next, we will apply convolutional concepts in a project where we train a model on real images!