Building Siamese Networks using TensorFlow’s Functional API: A Comprehensive Guide

6 min readAug 12, 2023

Siamese networks have gained significant attention for their unique ability to learn similarity and distance metrics between input samples. Siamese networks provide an elegant solution to tackle similarity learning tasks. In this comprehensive guide, we’ll look into Siamese networks, exploring their architecture, building Siamese network with TensorFlow’s Functional API, and step-by-step implementation.

Siamese twins painted by Henri Matisse, Dall-E

Understanding Siamese Networks

Siamese networks, named after the iconic “Siamese twins,” are designed to process pairs of input samples, aiming to determine their degree of similarity. The architecture involves twin networks that share weights, enabling the network to extract essential features from the input data while capturing their underlying relationships. The core objective is to transform input pairs into feature representations that can be easily compared.

One of the pivotal elements of Siamese networks is the contrastive loss function. This loss function drives the learning process by pushing similar samples together in the feature space while pushing dissimilar samples apart.

By optimising this loss, Siamese networks master the art of understanding and quantifying similarity, making them valuable assets in various domains.

TensorFlow’s Functional API: Advantages and Basics

TensorFlow’s Functional API is a powerful tool for constructing intricate neural architectures. Unlike the Sequential API, which is suitable for simple linear stacks of layers, the Functional API allows for more flexibility and creativity. With the Functional API, you can design models with multiple inputs and outputs, catering to complex requirements.

At its core, the Functional API operates on the idea of building models as graphs of layers. Each layer acts as a building block, and you can easily create models by connecting layers to define the flow of data. This API is particularly well-suited for Siamese networks, where twin networks share layers and weights, demanding a more intricate structure.

In the upcoming sections, we will dive into the practical implementation of Siamese networks using TensorFlow’s Functional API. We will cover everything from architecture construction to loss function definition and model training.

Building a Siamese Network using TensorFlow’s Functional API

1. Importing Libraries and Preparing Data
When working with a Siamese network, your dataset should be organized into pairs, each labeled as similar or dissimilar. For instance, if you’re working with image data, you might have directories containing subdirectories for each class. You can then generate pairs of images, along with their labels (0 for similar, 1 for dissimilar), and preprocess the images (resizing, normalization, etc.).

2. Constructing the Siamese Architecture
In this step, you’ll use the Functional API to build the Siamese architecture. Thanks to the Functional API you won’t need to duplicate the base network to create two identical subnetworks, instead you can use the same base network and pass the inputs separately.


import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout, Lambda

def initialize_base_network():
    input = Input(shape=(28,28,), name="base_input")
    x = Flatten(name="flatten_input")(input)
    x = Dense(128, activation=’relu’, name="first_base_dense")(x)
    x = Dropout(0.1, name="first_dropout")(x)
    x = Dense(128, activation=’relu’, name="second_base_dense")(x)
    x = Dropout(0.1, name="second_dropout")(x)
    x = Dense(128, activation=’relu’, name="third_base_dense")(x)

    return Model(inputs=input, outputs=x)

def euclidean_distance(vects):
    x, y = vects
    sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
    return K.sqrt(K.maximum(sum_square, K.epsilon()))

def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)

# create the left input and point to the base network
input_a = Input(shape=(28,28,), name="left_input")
vect_output_a = base_network(input_a)

# create the right input and point to the base network
input_b = Input(shape=(28,28,), name="right_input")
vect_output_b = base_network(input_b)

# measure the similarity of the two vector outputs
output = Lambda(euclidean_distance, name="output_layer", output_shape=eucl_dist_output_shape)([vect_output_a, vect_output_b])

# specify the inputs and output of the model
model = Model([input_a, input_b], output)

# plot model graph
plot_model(model, show_shapes=True, show_layer_names=True, to_file='outer-model.png')

Plot the model architecture to make sure the branching of NN is as required.

3. Defining Contrastive Loss Function
The contrastive loss function is the guiding force behind Siamese network training. It measures the similarity between the feature embeddings of two input samples and adjusts the network’s weights to minimize this loss.

The contrastive loss is typically defined as a function of the Euclidean distance (or another chosen similarity metric) between the feature embeddings of the two input samples. If the samples are similar (label 0), the loss encourages the embeddings to be closer together. If the samples are dissimilar (label 1), the loss encourages the embeddings to be farther apart.

def contrastive_loss_with_margin(margin):
    def contrastive_loss(y_true, y_pred):
        '''Contrastive loss from Hadsell-et-al.'06
        http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
        '''
        square_pred = K.square(y_pred)
        margin_square = K.square(K.maximum(margin - y_pred, 0))
        return (y_true * square_pred + (1 - y_true) * margin_square)
    return contrastive_loss

rms = RMSprop()
model.compile(loss=contrastive_loss_with_margin(margin=1), optimizer=rms)
history = model.fit([tr_pairs[:,0], tr_pairs[:,1]], tr_y, epochs=20, batch_size=128, validation_data=([ts_pairs[:,0], ts_pairs[:,1]], ts_y))

4. Evaluation
After training, it’s important to evaluate the Siamese network’s performance on validation or test data.

def compute_accuracy(y_true, y_pred):
    '''Compute classification accuracy with a fixed threshold on distances.
    '''
    pred = y_pred.ravel() < 0.5
    return np.mean(pred == y_true)

loss = model.evaluate(x=[ts_pairs[:,0],ts_pairs[:,1]], y=ts_y)

y_pred_train = model.predict([tr_pairs[:,0], tr_pairs[:,1]])
train_accuracy = compute_accuracy(tr_y, y_pred_train)

y_pred_test = model.predict([ts_pairs[:,0], ts_pairs[:,1]])
test_accuracy = compute_accuracy(ts_y, y_pred_test)

print("Loss = {}, Train Accuracy = {} Test Accuracy = {}".format(loss, train_accuracy, test_accuracy))

Advanced Techniques and Applications

Triplet Loss

While the contrastive loss is commonly used in Siamese networks, another intriguing alternative is the triplet loss. Triplet loss is incorporating three samples for each training instance: an anchor, a positive (similar) sample, and a negative (dissimilar) sample. This setup encourages the network to enhance the feature separation between similar and dissimilar pairs.

The core principle behind triplet loss is to minimize the distance between the anchor and the positive sample while maximizing the distance between the anchor and the negative sample. By introducing the anchor as a reference point, the network fine-tunes its embeddings to respect the relative distances, resulting in feature spaces that are better suited for similarity comparisons.

Transfer Learning

Transfer learning plays a pivotal role in enhancing the capabilities of Siamese networks. By leveraging pre-trained base networks, you can adapt well-established feature extractors to your similarity learning tasks. This approach is especially valuable when data availability is limited. The idea is to use a base network that has been trained on a large and diverse dataset and fine-tune it for your specific application.

The process involves taking the pre-trained base network and attaching the twin networks of the Siamese architecture to it. This way, the base network learns to extract features that are relevant to the underlying similarity relationships in your data. The pre-trained weights provide a head start, allowing your network to capture intricate details and patterns from your data more effectively.

Applications

Siamese networks find applications across a wide spectrum of domains, showcasing their versatility in solving similarity learning challenges:

Face Recognition: Siamese networks are at the heart of modern face verification systems. By learning embeddings that capture unique facial features, these networks can verify whether two face images belong to the same person or not.
Signature Verification: In the realm of security, Siamese networks contribute to signature verification systems. The networks learn to differentiate between genuine and forged signatures by focusing on the subtle nuances that distinguish them.
Object Tracking: Siamese networks are utilized in object tracking scenarios, tracking and matching objects across frames in videos. The networks learn embeddings that represent the appearance of objects, aiding in accurate tracking.
Image Retrieval: Siamese networks enable image retrieval by capturing visual similarities. Given a query image, the network retrieves visually similar images from a database, enabling powerful content-based search.
Text Similarity: Siamese networks extend their reach to text data. They can measure similarity between textual documents, facilitating tasks such as plagiarism detection and semantic search.

Conclusion

Siamese networks have proven to be invaluable tools in various domains, providing elegant solutions to similarity learning tasks. Their ability to capture and quantify similarities between input samples makes them a valuable asset for a wide range of applications. By leveraging TensorFlow’s Functional API, you’ve gained a powerful tool to construct complex neural architectures that can tackle intricate tasks.