CNN Layers - PyTorch Deep Neural Network Architecture
text
PyTorch CNN Layer Parameters
Welcome back to this series on neural network programming with PyTorch. In this post, we are going to learn about the layers of our CNN by building an understanding of the parameters we used when constructing them.
Without further ado, let's get to it!
Our CNN Layers
In the
last post, we started building our CNN by extending the PyTorch neural network Module
class and defining some layers as class attributes. We defined two convolutional layers and three linear
layers by specifying them inside our constructor.
class Network(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)
def forward(self, t):
# implement the forward pass
return t
Each of our layers extends PyTorch's neural network Module
class. For each layer, there are two primary items encapsulated inside, a forward function definition and a weight tensor.
The weight tensor inside each layer contains the weight values that are updated as the network learns during the training process, and this is the reason we are specifying our layers as attributes inside our Network
class.
PyTorch's neural network Module
class keeps track of the weight tensors inside each layer. The code that does this tracking lives inside the nn.Module
class, and since we are
extending the neural network module class, we inherit this functionality automatically.
Remember, inheritance is one of those object oriented concepts that we talked about last time. All we have to do to take advantage of this functionality is assign our layers as attributes inside our network module, and the Module
base class will see this and register the weights as learnable parameters of our network.
CNN Layer Parameters
Our goal in this post is to better understand the layers we have defined. To do this, we're going to learn about the parameters and the values that we passed for these parameters in the layer constructors.
Parameter vs Argument
First, let's clear up some lingo that pertains to parameters in general. We often hear the words parameter and argument, but what's the difference between these two?
Parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function. The parameters can be thought of as local variables that live inside a function.
In our network's case, the names are the parameters and the values that we have specified are the arguments.
Two types of parameters
To better understand the argument values for these parameters, let's consider two categories or types of parameters that we used when constructing our layers.
- Hyperparameters
- Data dependent hyperparameters
A lot of terms in deep learning are used loosely, and the word parameter is one of them. Try not to let it throw you off. The main thing to remember about any type of parameter is that the parameter is a place-holder that will eventually hold or have a value.
The goal of these particular categories is to help us remember how each parameter's value is decided.
When we construct a layer, we pass values for each parameter to the layer's constructor. With our convolutional layers have three parameters and the linear layers have two parameters.
-
Convolutional layers
-
in_channels
-
out_channels
-
kernel_size
-
-
Linear layers
-
in_features
-
out_features
-
Let's see how the values for the parameters are decided. We'll start by looking at hyperparameters, and then, we'll see how the dependent hyperparameters fall into place.
Hyperparameters
In general, hyperparameters are parameters whose values are chosen manually and arbitrarily.
As neural network programmers, we choose hyperparameter values mainly based on trial and error and increasingly by utilizing values that have proven to work well in the past. For building our CNN layers, these are the parameters we choose manually.
-
kernel_size
-
out_channels
-
out_features
This means we simply choose the values for these parameters. In neural network programming, this is pretty common, and we usually test and tune these parameters to find values that work best.
Parameter | Description |
---|---|
kernel_size
|
Sets the height and width of the filter. |
out_channels
|
Sets depth of the filter. This is the number of kernels inside the filter. One kernel produces one output channel. |
out_features
|
Sets the size of the output tensor. |
One pattern that shows up quite often is that we increase our out_channels
as we add additional conv layers, and after we switch to linear layers we shrink our out_features
as we
filter down to our number of output classes.
All of these parameters impact our network's architecture. Specifically, these parameters directly impact the weight tensors inside the layers. We'll dive deeper into this in the next post when we talk about learnable parameters and inspect the weight tensors, but for now, let's cover dependent hyperparameters.
Data dependent hyperparameters
Data dependent hyperparameters are parameters whose values are dependent on data. The first two data dependent hyperparameters that stick out are the in_channels
of the first convolutional layer,
and the out_features
of the output layer.
You see, the in_channels
of the first convolutional layer depend on the number of color channels present inside the images that make up the training set. Since we are dealing with grayscale
images, we know that this value should be a 1
.
The out_features
for the output layer depend on the number of classes that are present inside our training set. Since we have 10
classes of clothing inside the Fashion-MNIST dataset,
we know that we need 10
output features.
In general, the
input to one layer is the
output from the previous layer, and so all of the in_channels
in the conv layers and in_features
in the linear layers depend on the data coming from the previous layer.
When we switch from a conv layer to a linear layer, we have to
flatten our tensor. This is why we have 12*4*4
. The twelve comes from the number of output channels in the previous layer, but why do we have the two 4
s? We cover how we get
these values in a future post.
Summary of layer parameters
We'll learn more about the inner workings of our network and how our tensors flow through our network when we implement our forward()
function. For now, be sure to check out this table
that describes each of the parameters, to make sure you can understand how each parameter value is determined.
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)
Layer | Param name | Param value | The param value is |
---|---|---|---|
conv1 | in_channels | 1 | the number of color channels in the input image. |
conv1 | kernel_size | 5 | a hyperparameter. |
conv1 | out_channels | 6 | a hyperparameter. |
conv2 | in_channels | 6 | the number of out_channels in previous layer. |
conv2 | kernel_size | 5 | a hyperparameter. |
conv2 | out_channels | 12 | a hyperparameter (higher than previous conv layer). |
fc1 | in_features | 12*4*4 | the length of the flattened output from previous layer. |
fc1 | out_features | 120 | a hyperparameter. |
fc2 | in_features | 120 | the number of out_features of previous layer. |
fc2 | out_features | 60 | a hyperparameter (lower than previous linear layer). |
out | in_features | 60 | the number of out_channels in previous layer. |
out | out_features | 10 | the number of prediction classes. |
Kernel vs Filter
Note that, in deep learning, we often use the words filter and kernel interchangeably. However, there is a technical distinction between these two concepts.
A kernel is a 2D tensor, and a filter is a 3D tensor that contains a collection of kernels. We apply a kernel to a single channel, and we apply a filter to multiple channels. To learn more about this distinction, check out this stackexchange post.
Thank you to Thorwald from the community for pointing this out!
Wrapping up
In the next post, we'll learn about learnable parameters, which are parameters whose values are learned during the training process. See you there!
quiz
resources
updates
Committed by on