PyTorch framework is useful in building AI algorithms using Machine or Deep Learning models. In supervised learning, these models use the output data called labeled features to calculate the loss value of the model’s predictions. Contrastive is a learning technique that does not use labeled data to make predictions and to evaluate its loss value.

What is Contrastive Learning

Contrastive is also called Self-Supervised Learning or SSL which lies between supervised and unsupervised learning. As we know, the labels are not provided for the output field so the model has to learn the available or hidden patterns from the dataset. Once the model has learned the features, it brings the similar objects closer and keeps the dissimilar ones away. Additionally, it is effectively applied in clustering, classification, and object detection systems.

What is Contrastive Loss

The contrastive loss usually evaluates the performance of the contrastive learning models but it also works with other models as well. It takes a sample object from the dataset as an anchor and compares it with all the other data points. Using the anchor point, contrastive loss simply creates the positive set consisting of similar objects and the negative set for dissimilar data points. After that, it evaluates the model’s performance using similar and dissimilar sets.

How to Calculate Contrastive Loss

Contrastive Loss value calculates and improves the performance of the SSL models. To do so, data augmentation is the preprocessing step to normalize or randomize the data points according to the requirement of the model. 

Using the augmented data set, train the model and then test it using the unseen data to find the contrastive loss value. To learn the process of calculating the contrastive loss in self-supervised learning, go through the listed steps:

Step 1: Importing Libraries

First, get the required dependencies and functions by importing their libraries to be used in the Python language:

import numpy as np #importing numpy to get arrays in python
import torch #importing torch to get the contrastive loss
import tensorflow as tf #importing tensorflow to build SSL model
import tensorflow.keras.backend as K
from tensorflow.keras import metrics, layers
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt #importing matplotlib to build graphs
from sklearn.metrics import accuracy_score #importing sklearn to evaluate performance
  • Import another library called NumPy as np to store the dataset in the form of multidimensional arrays.
  • After that, import the torch library to design the model using self-supervised learning.
  • Now, import the TensorFlow library to get the optimizers for the Self-Supervised Learning model.
  • The TensorFlow contains Keras dependency to import the libraries like metrics, layer, mnist, and Model.
  • These libraries design and evaluate the model’s performance with the desired dataset.
  • Here, use the pyplot API as the plt by importing the matplotlib library to design graphical representations.
  • Lastly, import the accuracy_score library from scikit-learn to evaluate the model’s performance. 

Step 2: Building and Preprocessing Data Structure

Now, build the data structures for storing the data points in the desired format using the following code:

def preprocess(array):

   array = array.astype('float32') / 255
   array = np.reshape(array, (len(array), 28, 28, 1))
   return array
  • Create the preprocess() method with the array variable as its argument to build the format or structure for storing the data.
  • In the preprocess() method, define the array variable and normalize it into the appropriate format using the astype() method.
  • As 255 is the largest value of the byte so dividing the data with it normalizes the values between 0.0 and 1.0.
  • Now, set its length and structure using the reshape() method with the array variable as an argument.
  • Finally, return the array whenever the preprocess() method is called

After that, create another method called display_pairs() with the images, labels, and correct arguments to display the pairs of images. The images are extracted from the data set for the model to understand their features by comparing them. The comparison allows the model to learn the hidden features of the data objects while training the model:

def display_pairs(images, labels, correct=None):
    n = 6

    plt.figure(figsize=(20, 6)) #setting the size of the figure to get images
    for i, (image1, image2) in enumerate(zip(images[:n, 0], images[:n, 1])):    #getting the labels of the images from the authentic data
        label = int(labels[:n][i][0])
        #setting the color scheme of the labels with the images
        text = "Label"
        color = "silver"
        #conditional statement to display the predictions of the model
        if correct is not None:
            color = "green" if correct[:n][i][0] else "red"
            text = "Prediction"        #setting the grid structure of the images after the prediction
        g = plt.subplot(3, n, i + 1)        #placement of the text on the images as the correct or false predictions
        g.text(1, -3 ,f"{text}: {label}", style="italic", bbox={            #color combination and font of the text
            "facecolor": color,
            "pad": 4
        #display the first image in the combination
        plt.imshow(image1.reshape(28, 28))
        #display the second image in the combination for comparison and prediction
        g = plt.subplot(3, n, i + 1 + n)
        plt.imshow(image2.reshape(28, 28))
  • Create a variable called “n” with the value 6 to display pairs of images on the screen with their labels from the array.
  • The labels are stored in the form of zeros or ones as 1 for images that have similar features and 0 for the rest.
  • The correct variable in the “if” condition refers to the assignment of proper labels to each pair. 
  • After comparing them in the arrays, use a for loop with image1 and image2 arguments to store the labels.
  • Now, use the plt keyword from the matplotlib library to show the pairs on the screen with the labels.

Now, create another function called plot_history() method to store the history of the loss values while training the model:

def plot_history(history):

    plt.title('Training and Validation Loss')
    plt.legend(['train', 'val'], loc='upper right')
  • Use the plot() method with the history variable to get all the values of loss generated throughout the training process.
  • Now, the title() method displays the title of the graph.
  • The ylabel() and xlabel() methods display the names for the x and y-axis.
  • Here, the legend() method displays the names like train and val of the lines plotted on the graph with location as upper right.
  • Finally, use the show() method to display the training and validation loss.

The last method in the preprocessing step is called generate_pairs() with images and labels arguments:

def generate_pairs(images, labels):

    x_pairs = []
    y_pairs = []

    for i in range(len(images)):
        label = labels[i]

        j = np.random.choice(np.where(labels == label)[0])
        x_pairs.append([images[i], images[j]])
        k = np.random.choice(np.where(labels != label)[0])
        x_pairs.append([images[i], images[k]])

    indices = np.arange(len(x_pairs))
    return np.array(x_pairs)[indices], np.array(y_pairs)[indices]
  • Firstly, create two arrays called x_pairs and y_pairs to store both the images for the pairs.
  • Now, use the for loop with the length of the data and check the labels of each pair.
  • After that, use the j” variable to store the label 1 for similar images called positive pairs.
  • The “k” variable stores the 0 labels for the dissimilar images known as negative pairs.
  • Finally, get the length of the array in the indices variable and then shuffle them to make it difficult for the model to train.
  • Returns the index number of both the images in an array.

Step 3: Downloading and Splitting the Dataset

Use the following code to get the dataset from the mnist library from the TensorFlow using the load_data() method:

(x_train, y_train), (x_test, y_test) = mnist.load_data()
VALIDATION_SIZE = int(len(x_train) * 0.2)
#splitting the data into training and validation samples
x_value = x_train[:VALIDATION_SIZE]
y_value = y_train[:VALIDATION_SIZE]
x_train = x_train[VALIDATION_SIZE:]
y_train = y_train[VALIDATION_SIZE:]
#store pre-processed data into their variables
x_train = preprocess(x_train)
x_value = preprocess(x_value)
x_test = preprocess(x_test)

print(f"Train: {len(x_train)}")
print(f"Validation: {len(x_value)}")
print(f"Test: {len(x_test)}")
  • Store the downloaded data in the x and y variables and split them into testing and training data
  • At the time of downloading data, use x_train and y_train to store the training sample.
  • Use x_test and y_test to store the testing sample from the dataset.
  • Now, split the training data even more into the training and validation data.
  • The validation data is used to validate the authenticity of the performance of the model on the unseen data..
  • After that, apply the preprocess() method on the train, validation, and test variables to get them normalized.
  • Finally, print the downloaded data split into the train, validation, and test variables as displayed below:

Step 4: Building Pairs of DataPoints

Now, create the pairs from the data stored in the x and y variables using the generate_pairs() method. Generate 3 pairs of samples using the train, validation, and test datasets and store them in their variables:

x_pairs_train, y_pairs_train = generate_pairs(x_train, y_train)
x_pairs_value, y_pairs_value = generate_pairs(x_value, y_value)
x_pairs_test, y_pairs_test = generate_pairs(x_test, y_test)

Call the display_pairs() method with the training pairs (x_pairs_train, y_pairs_train) as arguments:

display_pairs(x_pairs_train, y_pairs_train)

The following screenshot displays some of the training pairs with labels suggesting positive and negative samples:

Step 5: Designing Contrastive Loss and Accuracy

Create the methods to calculate the metrics like accuracy and loss value to enhance the performance of the model:

def norm(features):    #Applying the distance formula to find the intensity of dissimilarity between images
    return tf.norm(features[0] - features[1], axis=1, keepdims=True)#Get the accuracy to check how good the model is performing
def accuracy(y_true, y_pred):
    return metrics.binary_accuracy(y_true, 1 - y_pred)
#Get the loss value to check the flaws in the model and improve it
def contrastive_loss(y_true, y_pred):
    margin = 1
    y_true = tf.cast(y_true, y_pred.dtype)
    loss = y_true / 2 * K.square(y_pred) + (1 - y_true) / 2 \
                  * K.square(K.maximum(0.0, margin - y_pred))

    return loss
  • Create the norm() method with features argument to get the distance between the pairs of images.
  • Return the positive or negative pair using the features array containing 0 and 1 labels.
  • Define the accuracy() method by comparing the actual and predicted values from the dataset.
  • Here, define the contrastive_loss() method with the formula of the contrastive loss to return the loss value.

Step 6: Building the CNN Model

Design the siamese_twin network using the convolutional structure with its activation functions. It uses the similarity or distance functions to create positive or negative pairs of images or data objects:

def siamese_twin():    # sets the input dimensions for the siamese_twin model
    inputs = layers.Input((28, 28, 1))
    # gets the first convolutional layer with the activation to get output
    x = layers.Conv2D(128, (2, 2), activation="relu")(inputs)    # gets the pooling layer to reduce the dimensions of the data
    x = layers.MaxPooling2D((2, 2))(x)    # sets the drop layer to drop unnecessary features
    x = layers.Dropout(0.4)(x)
        # gets the second convolutional layer with the activation to get output
    x = layers.Conv2D(128, (2, 2), activation="relu")(x)    # gets the pooling layer for second hidden layer
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Dropout(0.4)(x)

    # gets the third convolutional layer with the activation to get output
    x = layers.Conv2D(64, (2, 2), activation="relu")(x)    # gets the pooling layer for the third layer as well
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Dropout(0.4)(x)

    x = layers.GlobalAveragePooling2D()(x)    # set the final layer to get the predictions from the model
    outputs = layers.Dense(128, activation=None)(x)
    # return the predictions using the labeled data from the original data
    return Model(inputs, outputs)

def siamese_network():
    input1 = layers.Input(shape=(28, 28, 1))    # set the structure of the layers using their dimensions in the shape argument
    input2 = layers.Input(shape=(28, 28, 1))

    twin = siamese_twin()
    distance = layers.Lambda(norm)([

    return Model(inputs=[input1, input2], outputs=distance)
  • The siamese_twin model creates the input layer using the Input() method with dimensions as arguments.
  • Create three hidden layers with an activation function called relu to get outputs within the hidden layers.
  • Initialize the output variable to design the final layer with 128 dimensions.
  • After that, create two variables as input1 and input2 in the siamese_network() method.
  • These variables store both the images from the pair and compare them to calculate the similarities or differences.
  • The output extracted from the comparison is used to assign the labels to the pairs.
  • Finally, return the Model() with both the images and their distance as the output.

Step 7: Getting the Summary of the Model

Call the siamese_network() method to integrate all the components like loss, optimizer, and metrics using the compile() function. In the end, call the summary() method with the model variable to display the summary of the model on the screen:

model = siamese_network()


The following screenshot displays the summary of the model with its structure containing trainable and untrainable parameters:

Step 8: Training the Model

Create the hist variable and call the fit() method with the model variable to train the model with multiple arguments:

hist =    # training the model on the training data so the model can understand the features form the data
    x=[x_pairs_train[:, 0], x_pairs_train[:, 1]],
    y=y_pairs_train[:],    # test the model using the validation data to check the authenticity of the model
    validation_data=([x_pairs_value[:, 0], x_pairs_value[:, 1]], y_pairs_value[:]),


  • Firstly, the x variable contains the input data with the positive and negative pairs, and the y variable stores the predictions.
  • validation_data stores the values from the “x” and “y” arguments to check the authenticity of the model.
  • Now, the batch_size argument contains the intermediate steps within the iterations for training the model.
  • Finally, the epochs refer to the 5 iterations with 64 batches to complete the training process:

The following screenshot displays the 5 training iterations and the loss values for each iteration:

Step 9: Plotting Results

In the end, call the plot_history() method with the hist variable containing the trained model from the previous step:


The following screenshot displays the loss values for the training and validation dataset. The loss value for the training data is displayed using a blue line and the validation’s loss is displayed using an orange color:

The decrease in the loss values for both the training and validation data suggests an improvement in the model. The blue line is trained on the training data and its loss value is bigger whereas the validation’s loss value is less. This happens because the model is well-trained at the time of validation data and it got even better with the training on the validation data.

That’s all about calculating the contrastive loss value using the convolution neural networks model.


To calculate the contrastive loss value, build or extract a dataset that does not contain the output labels so the model can learn it by itself. This process makes the model the self-supervised learning model which means that the model gets labels by understanding the features from the data. Finally, design the structure of the model and train the model using the training data to get the loss value.