Deep Learning models use neural networks containing neurons within multiple layers. These neurons are connected to the neurons of the next layer with their weights for each connection. Optimizers in deep learning have a major role as they are used to reduce the loss value by managing the weights. There are different optimizers like SGD and Adam used in deep learning to improve the accuracy of the model.

What is Adam Optimizer

Adam or Adaptive Moment Estimation optimizers are used to control the learning rate for the model. Neural networks use feedforward techniques to predict the future and backpropagation to improve their performance. At the start of the model, the weights are given at random by the user and they are updated or fine-tuned by the model while backpropagating. It is done through multiple iterations while training the model.


To use the Adam optimizer in PyTorch, simply implement the following syntax:

torch.optim.Adam(parameter, lr=<value>, betas=<value>, eps=<value>)
  • Use the optim package from the torch library to call the Adam() method with its arguments.
  • parameter argument is used to update the model parameters like weights, initialized by the user at the start.
  • learning rate (lr) is used to get the steps to improve the accuracy.
  • The optimizer uses the betas argument to remember its previous movements
  • The eps argument is the stability parameter to get the diversity in the values.

How to Use Adam Optimizer in PyTorch

Use the Adam optimizer in neural networks to control the parameters and learning rate of the model. First of all, build the structure of the model with the dimensions of the neural networks. After that, use a dataset to train the model and optimizer to enhance the performance throughout the training phase. To learn the process of using the Adam optimizer in deep learning, simply go through the following steps:

Step 1: Importing Torch Library

The torch library contains multiple packages and methods to build and optimize the deep learning models in Python:

import torch

Step 2: Setting up the Model’s Dimensions

The next step is to set the dimensions for multiple layers of the neural network structure with neurons present in them. These dimensions are required while setting up the structure of the neural networks and the flow of the model. They also explain the process of getting input and evaluating it to get the final output:

batch = 128
input_dim = 2000
hidden_dim = 200
output_dim = 20
  • Creating a batch variable means that each epoch or iteration of the training contains 128 mini-batches to train the model.
  • Here, the input_dim variable contains 2000 values suggesting the dimensions of the input layer.
  • Now, the hidden_dim contains the 200 value that refers to the dimensions of the hidden layer.
  • The output_dim suggests the 20 dimensions at the output layer and it produces the predictions as the output.

Step 3: Building the Dataset

After setting up the model’s dimensions, simply create the tensors to build the dataset using the following code. The dataset is stored in the input variable and the output variable is going to store the predicted values using the output_dim variable:

input = torch.randn(batch, input_dim)
output = torch.randn(batch, output_dim)
  • The input variable stores the tensor with random values using the batch and input dimensions.
  • The second tensor with random values is stored in the output variable with batch and output dimensions.
  • Both the tensors store the random numbers with normal distributions with mean=0 and variance=1.

Step 4: Building the Neural Network Model

Now, build the structure of the neural network using the model functions and the number of layers with the activation functions. This explains the structure of the model as to how all the layers containing the neurons work to generate output values:

model = torch.nn.Sequential(
    torch.nn.Linear(input_dim, hidden_dim),
    torch.nn.Linear(hidden_dim, output_dim),
  • Design a Neural Network structure stored in the model variable using the Sequential() method.
  • The Sequential() model places all the processes of the deep learning in a sequential order.
  • Design a layer that takes the input from the user and produces the output for the hidden layer.
  • Call the activation function using the ReLU() method to apply the non-linearity method on the output of the hidden layer.
  • Create another layer by calling the Linear() method with hidden and output dimensions.
  • It takes the output of the hidden layer as input and produces the final output.

Step 5: Calling the Adam Optimizer

Call the loss function with the optimizer to improve the performance of the model in the training process:

loss_fn = torch.nn.MSELoss(reduction='sum')
optim = torch.optim.Adam(model.parameters(), lr=0.05)
  • Before the Adam optimizer, call the MSELoss or any loss function offered by the torch library to get a mean sum error at the time of prediction.
  • Create the optim variable to store the optimizer method with its arguments.
  • parameters() argument is used to finetune the hyperparameters of the model like weights etc.
  • The learning rate controls the optimization steps at the time of backpropagation.

Step 6: Model’s Training

Now, head towards the model training phase with multiple iterations so the model can be implemented more accurately. Multiple iterations are used in training so the model can evaluate its performance and bring the best possible outcome:

for epoch in range(1):

    running_loss = 0
    for i, data in enumerate(input, 0):

        pred = model(input)
        loss = loss_fn(pred, output)
        running_loss += loss.item()
        print('[%d, %5d] Value: %.3f' %
            (epoch + 1, i + 1, running_loss / 10000))
        running_loss = 0

print('Finished Training')
  • Use the for loop with any number of iterations and the starting loss value at 0 followed by the nested for loop.
  • Integrate all the components created earlier to apply backpropagation after getting the predictions for each batch.
  • Using the predicted and output values, get the loss value from the loss_fn() method and then apply the optimizer.
  • At the end, print loss values for each batch in the iteration or epoch.

The loss values in the above snippet explain that the model is improving itself with each batch. The main idea of using the optimizer in machine learning is to get the optimal results after each iteration. There are many optimizers offered by the Torch library but majorly used are SGD and Adam. Let’s see what are some of the differences among them while optimizing the deep learning models:

Difference Between Adam and SGD Optimizer

There are multiple differences between the SGD and Adam optimizer regarding the deep learning models as mentioned below:

  • SGD optimizer keeps a single learning rate with momentum but Adam keeps on changing the learning rate throughout the iterations.
  • SGD produces high convergence by skipping over the minimum steps which brings in the Adam optimizer to get better optimization.
  • Adam optimizer extends the SGD algorithm by dynamically changing the learning rate for each weight.

 That’s all about using the Adam optimizer in the PyTorch framework.


To use the Adam optimizer in PyTorch, build the neural network using the torch library, and call the Adam() optimizer to enhance its predictions. Optimizers improve the performance of the deep learning model while applying backpropagation techniques. This guide builds the Linear() neural network structure and builds random data to train the model. After that, it calls the Adam() method as an optimizer during the training process and gets the loss value for each batch in the epochs.