The Deep learning models use neural networks to get insights from the dataset and predict the future using the unseen data. Optimization in the machine or deep learning is the process of enhancing or improving the accuracy of the algorithm. Stochastic Gradient Descent is the optimization algorithm for Machine Learning to reduce the loss value in PyTorch using learning rate and momentum.

How Does SGD Optimizer Work

The Stochastic Gradient Descent can be used by fine-tuning the parameters of the model and improving the performance during the backpropagation phase. Another parameter used in the SGD optimizer is the learning rate which refers to the steps a model should take to reduce the error. These steps should be at a high pace at the start and should gradually slow down using the momentum argument. The syntax of using the SGD() in PyTorch is mentioned below:

torch.optim.SGD(parameters, lr=<value>, momentum=0, dampening=0, weight_decay=0)
  • Torch is the library and SGD() is its optimizer algorithm with multiple arguments.
  • The first argument explains the parameters of the model that need to be fine-tuned during the execution 
  • These parameters are evaluated using the learning rate(lr) value which is the starting value after backpropagation.
  • The momentum is used to control the pace of learning or the process of reducing the loss value and 0 is its default value.
  • The dampening argument ensures that the learning rate is not too high and weight_decay is used to control the weights of the neurons.

How to Use SGD Optimizer in Deep Learning Model Using PyTorch

Start using the SGD optimizer in machine or deep learning by building the model with neural network architecture. Use the optimizer to improve the parameters of the model during the training phase using the backpropagation process. To learn the process of using the SGD optimizer in the deep learning model, simply head towards the following steps:

Step 1: Importing Libraries

The first step is to get the libraries that are required to use different functions in Python language:

import torch #importing torch to build the neural networks
import torchvision
import torch.nn as nn #importing nn to get the neural networks functions
import matplotlib.pyplot as plt
import torchvision.transforms as transforms #importing transformers from torchvision
import numpy as np #importing numpy to get arrays in python
import torch.nn.functional as F #importing functional to use metrics in neural network modelfrom torch.autograd import Variable
  • Start by importing the Torch library to get the neural network dependency using the nn keyword.
  • Then, get the matplotlib library to show graphical representations like images with their labels.
  • Use the NumPy library as the np to work with the arrays and store data in them.
  • After that, use the functional dependency to build high-performance deep-learning models.
  • Finally, get the Autograd from the torch library to correct the values of the hyperparameters.

Step 2: Extracting Dataset

After getting the libraries, use the transforms library to build and normalize the tensor for storing the dataset:

import torchvision.datasets as ds
transform = transforms.Compose(
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# download and normalize the CIFAR-10 training dataset
trainset = ds.CIFAR10(root='./data', train=True, download=True, transform=transform)# load the training dataset with batch size to get data
trainloader =, batch_size=4, shuffle=True, num_workers=2)
# download and normalize the testing dataset
testset = ds.CIFAR10(root='./data', train=False, download=True, transform=transform)# load the CIFAR-10 testing dataset with test data and shuffled images in batches
testloader =, batch_size=4, shuffle=False, num_workers=2)
# providing the fields of the dataset
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
  • Start this phase by building the data storage space and normalizing it before extracting the values using the Normalize() method.
  • Now, Initialize the trainset variable by downloading the CIFAR10 dataset using Torchvision’s datasets library.
  • Arrange the arguments of the CIFAR10() method by providing their values to split the data training or testing
  • After getting the data, load it to the notebook so it can be used with the model as training and testing sets.
  • Finally, explain the names of the classes from the dataset containing 60,000 images of 32×32 dimensions:

Step 3: Displaying Objects From Dataset

Define an imshow() method to get sample images from the dataset and display these images with their classes:

def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

# initialize the variable to load the training data
dataiter = iter(trainloader)# load all the images with their labels using the variable
images, labels = next(dataiter)

print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
  • The imshow() method is used to display the images from the dataset by storing them in the numpy array.
  • Now, load the data from the trainloader sample with the images and their labels using the iter() method.
  • In the end, use the make_grid() method to show the images in the grid format on the screen using the for loop.

The screenshot displays the sample images from the training data and they are blurred images. So, the model needs to be very accurate and thorough to extract the features of the images and classify them accordingly.

Step 4: Building Convolutional Neural Network Model

As we have the dataset loaded and ready, it is time to build the structure of the convolutional neural networks for the deep learning model:

class Net(nn.Module): # definition of Net class with neural network argument
    def __init__(self):# using the constructor of the class to set the structure of the model
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)#Convolutional layer of CNN
        self.pool = nn.MaxPool2d(2, 2)#MAxPool layer of CNN
        self.conv2 = nn.Conv2d(6, 16, 5)#Convolutional layer of CNN
        self.fc1 = nn.Linear(16 * 5 * 5, 120)#fFully Connected layer of CNN
        self.fc2 = nn.Linear(120, 84)#Fully Connected layer of CNN
        self.fc3 = nn.Linear(84, 10)#Fully Connected layer of CNN

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) #Activation function on the first convolutional layer of CNN
        x = self.pool(F.relu(self.conv2(x))) #Activation function on the second convolutional layer of CNN
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x)) #Activation function on the first fully connected layer of CNN
        x = F.relu(self.fc2(x)) #Activation function on the second fully connected layer of CNN
        x = self.fc3(x)
        return x

net = Net()
  • Design the structure of the neural network in the Net class using nn from the torch library.
  • After that, define the constructor to set the architecture of the neuron for building the convolutional neural network.
  • Build the convolutional layer to extract the features from the image given by the user at the input layer.
  • The MaxPoll2d layer is used to speed up the processing of the model with optimal dimensions.
  • Add another convolutional layer to get an understanding of the downsized images.
  • After the hidden layers, add three fully connected layers to get the predictions about the images with their classes.
  • Apply the feedforward approach by designing the forward() method by adding the activation functions for different layers.
  • Finally, store all the components of the Net() class in a variable called net so it can be easily used in the SGD optimizer.

Step 5: Using SGD() Function

After that, Call the CrossEntropyLoss() and SGD() methods using the nn and optim libraries. The SGD optimizer contains its learning rate and momentum to fine-tune the model’s parameters using the net variable:

import torch.optim as optim # importing the dependency to call the optimizer

criterion = nn.CrossEntropyLoss() #criterion is a variable containing the loss value
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) #SGD Optimizer

Step 6: Training the NN Model

Now, get to the model training phase with the for loop to get optimal epochs with all the required components. The components used for training the model are the model itself with the data normalization, optimizer, and loss function:

for epoch in range(3):
# training the model on the training data so the model can understand the features form the data
    running_loss = 0
    for i, data in enumerate(trainloader, 0):
        # getting the data and labels from the dataset
        inputs, labels = data
        inputs, labels = Variable(inputs), Variable(labels)
        optimizer.zero_grad() # applying gradient descent using the SGD
        outputs = net(inputs) # getting the predictions using the net variable
        loss = criterion(outputs, labels) # getting the loss value by comparing the output and labels

        running_loss += loss.item()
        if i % 2000 == 1999:        # extracting loss values after 200 mini-batches with improvement each time
            print('[%d, %5d] loss: %.3f' %            # printing the epoch number with the loss value after 2000 mini-batches
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')
  • The training phase uses 3 iterations for the model to reduce the loss value and enhance the accuracy using the for a loop.
  • Get the data from the input data with their labels to predict the model using the net(inputs) method.
  • After that, get the loss value using the criterion(outputs, labels) method, and then apply backpropagation to improve the loss value. 
  • The optimization plays a critical role here to ensure that the model improves its performance with each iteration.

Step 7: Testing the Model

To test the performance, use the iter() method with the testloader variable by printing objects from the sample dataset:

dataiter = iter(testloader)
images, labels = next(dataiter)

print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

The following screenshot displays the data from the testloader variable with the images and their field/classes. These images are not used in the training of the model so the model is not familiar with them previously. So, these can be a perfect fit for testing the model and then evaluating the performance of the model:

Now, store these images in the outputs variable so they can be compared with the predicted ones to check the performance of the model:

outputs = net(Variable(images))

Finally, get the predicted values for the images from the test data and print the results on the screen:

# store the predictions containing the labels of the test images_, predicted = torch.max(, 1)
# print the predicted labels from the test images
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

The following screenshot simply displays the classes predicted for each image displayed earlier. The performance of the model is very good as all the classes have been predicted correctly:

Step 8: Getting the Model’s Accuracy

Now, check the accuracy of the model by comparing all the images from the testloader variable with the predicted ones. We have done the process for 4 images only, but the following code does this for all the test data and displays the accuracy of the model:

correct = 0
total = 0
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(, 1) # storing all the predictions containing the labels from the test data in a loop
    total += labels.size(0)    # identifying the correct labels with the green color and wrong labels with the red color
    correct += (predicted == labels).sum()

print('Performance of the model with accuracy on test images: %d %%' % (
    100 * correct / total))

The model’s accuracy is 62% as all the images are not predicted correctly and it still can be improved. The problem that might be seen here is that the model has predicted all 4 images correctly but the accuracy is not 100%. There are 10 classes with 10,000 values so each of these values is not predicted correctly:

To evaluate the performance more thoroughly, it is important to test the accuracy of each class. It enables us to look for the weak classes and improve the model on that specific field:

#create a list to store the correct labels for all the test samplesclass_correct = list(0. for i in range(10))#create a list to store all the labels for all the test samples
class_total = list(0. for i in range(10))#loop to find the cumulative accuracy on all the classes in the dataset
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(, 1)
    c = (predicted == labels).squeeze()    #loop to check the predicted labels
    for i in range(4):
        label = labels[i]        #using the correct and total labels separately from the test data
        class_correct[label] += c[i]
        class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

The following screenshot displays the accuracy of all the classes individually as the accuracy of bird, cat, deer, etc. are not good results:

The useful insights from the above screenshot are that car, ship, and truck are the most accurately predicted classes. The middle-ranged accuracy occurs in the plane, dog, frog, and horse while rest classes need improvement.

That’s all about how to use the SGD optimizer in deep deep-learning models using Pytorch.


To use the Stochastic Gradient Descent in a deep learning model, simply use optim.SGD(arguments) method with arguments. The SGD optimizer is used to improve the learning phase during the CNN model’s training process. This guide has built the deep learning model, uses the SGD optimizer to optimize its performance, and then evaluates the performance.