Building a network in eager mode

Building a network with PyTorch

Created : 08/02/2022 | on Linux: 5.4.0-91-generic
Updated: 14/02/2022 | on Linux: 5.4.0-91-generic
Status: Draft

previous topic 1: Starting Development with PyTorch
previous topic 2: Tensors and Data Handling with PyTorch

Modules

the torch.nn namespace provides the key building blocks to build the network. Every module in pytorch subclasses the nn.module. The neural network itself is a nested module-inside-module entity. This mostly refers to to the layers being modules inside the larger module.

import os
import torch 
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms 

Select device

If torch.cuda is available we can run our code in a cuda gpu for a significantly high performance boost.

device = 'cuda' if torch.cuda.is_available() else `cpu`
printf(f'using {device}')

Class definition

Let’s examine the python code below to examine a NeuralNetwork Class!

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

The Class defined above subclasses the nn.Module. The layers of the network are sequentially defined inside the body of the __init__() method using call to nn.Sequential().

It is important to note that all classes that inherit from the nn.Module consists of a forward method that defines the operations carried out on the input data.

If we summarise:

define the network layout in the __init__ method.
define operations on input data in the forward method.

Instantiate the Model and do basic operations.

We can now instantiate the class defined above! At this point we can interrogate the instantiated class to learn about its structure.

model = NeuralNetwork().to(device)
print(model)

Output:

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

This shows the 4(0 indexed) layers of our neural network, please note the line for the final output layer ((4): Linear(in_features=512, out_features=10, bias=True)) the final output size is 10.

For the purpose of the tutorial we generate some random data representing a 28x28 single channel image.

X = torch.rand(1, 28, 28, device=device)

Now we feed it into the model that executes the forward method we defined earlier and some background operations that we will need to differentiate, optimise and update the weights later on.

The resulting model returns the 10 output features (logits) from the last layer of the network we defined following the linear transformation with the form (\(\boldsymbol{xW^T + b}\)).

logits = model(X)

then we want to convert the logits into probabilities using the Softmax function and choose the one with the highest probability as the predicted outcome.

pred_prob = nn.Softmax(dim=1)(logits)
y_pred = pred_prob.argmax(1)
print(f"Prediction: {y_pred}")

Code for the steps discussed above.

Model layers

Now we look at the layers of the model. To make things more transparent we are going to get some images from the FashionMNIST dataset to use in the next steps.

import torch
from torch import nn
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
import  numpy as n

training_data = datasets.FashionMNIST(root="data",
                                      train="True",
                                      download=True,
                                      transform=ToTensor()
                                     )

listsize = 3
image_list = []
for i in range(1, listsize + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    image_list.append(img)

images = torch.squeeze(torch.stack(image_list,0), 1) # image tensor 
print(images.size())

The code above will download (if necessary) the dataset and create a tensor of tensor of images.

Prepare the input

When we are using 1D stacked neuron layers we need to flatten the input to to match the \(\boldsymbol{xW^T + b}\) input shape for the linear transform. Here we initialise the nn.Flatten to convert the 2D 28x28 image into a contiguous array of 784 pixel values.

flatten = nn.Flatten()
flat_image = flatten(images)

The resulting tensor consisting of stacked image pixel values is of dimensions \(3\times784\), to get the desired \(3\times10\) discrete probabilities output our \(W^T\) needs to be \(784\times10\).

\[\boldsymbol{X} = \begin{pmatrix} x_{0,0} & x_{0,1} & ... & x_{0,783} \\ x_{1,0} & x_{1,1} & ... & x_{1,783} \\ x_{2,0} & x_{2,1} & ... & x_{2,783} \end{pmatrix}\]

print(flat_image.size())

torch.Size([3, 784])

nn.Linear

The linear layer applies the linear transformation \(\boldsymbol{xW^T + b}\) to the input using its stored weights and biases.

layer1 =nn.Linear(in_features=28*28, out_features=512)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 512])

Using Rectified Linear Units

The currently widely popular activation function (that is a linear ramp for values greater than 0 and clamped to zero otherwise) serves as a non-linear mapping between inputs to the model and its outputs.

hidden1 = nn.ReLU()(hidden1)

Before ReLU: tensor([[-0.0316, -0.0214,  0.0713,  ...,  0.0804,  0.1832, -0.0583],
        [-0.2158, -0.3031,  0.2640,  ..., -0.1948, -0.1070, -0.1727],
        [-0.0818,  0.1435, -0.0153,  ...,  0.0534,  0.3041, -0.0452]],
       grad_fn=<AddmmBackward0>)
After ReLU: tensor([[0.0000, 0.0000, 0.0713,  ..., 0.0804, 0.1832, 0.0000],
        [0.0000, 0.0000, 0.2640,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.1435, 0.0000,  ..., 0.0534, 0.3041, 0.0000]],
       grad_fn=<ReluBackward0>)

nn.Sequential

As the name suggests nn.Sequential provides a container to layout a ordered list of layers. The data is propagated to the proposed sequential order. Below is a quick example for a quick sequential NN made from the blocks we discussed earlier

seq_modules = nn.Sequential(flatten,  # flatten 28x28 = 784 feature tensors
                            layer1,   # input: flattened features, output 512 units
                            nn.ReLU(),
                            nn.Linear(512, 10)
                            )

logits = seq_modules(images)

nn.Softmax

logits returned from the nn.Linear are in the range of [-inf, +inf]. We convert them to values in [0, 1] to represent probabilities associated with corresponding predicted outcomes,

softmax = nn.Softmax(dim=1) # `dim` indicated the dimension along the values must sum to 1
predicted_probabilities = softmax(logits)   
print(f"predicted probababilities = {predicted_probabilities}")

Please find the code for the topics we discussed in this section.

Model Parameters

when data passes through a Neural Network structure the base input data is transformed by conducting mathematical operations with values residing inside the structure. Each layer consists of numerous parameters such as weights and biases.

the nn.Module automatically tracks all the fields defined inside the model object and makes all parameters accessible using the model’s parameters() or named_parameters() methods.

Check below for a complete code example that you can run and experiment with.

Check the next topic

Source: PyTorch Tutorial

Click here to report Errors, make Suggestions or Comments!