Neural Networks with Quadratic Layers

5 min readAug 13, 2023

The ability to capture complex relationships within data is crucial. While traditional linear transformations play a significant role, they might fall short when it comes to unraveling intricate non-linear interactions hidden within the data. This is where quadratic layers come into place, offering an approach for enhancing neural networks’ capabilities. In this blog post, we’ll look into the concept of quadratic layers.

Artificial Intelligence if painted by Salvador Dali, Dall-E

Understanding Quadratic Transformations

Before we create quadratic layers, let’s grasp the essence of quadratic transformations. Imagine a scenario where a linear model fails to capture the intricate dependencies between features. Quadratic transformations are mathematical operations that involve squaring the input variables to create quadratic terms. These quadratic terms represent the interaction between the features in a non-linear manner. While linear transformations involve multiplying features by weights, quadratic transformations introduce a more intricate relationship by considering the squared values of those features.

Examples Illustrating Quadratic Interactions

Let’s have a look at quadratic transformations in the context of different domain applications.

1. Computer Vision — Image Recognition:

Imagine you’re building an image recognition model to identify different shapes. A linear transformation might struggle to capture the subtle nuances that distinguish these shapes. However, quadratic transformations can come to the rescue. Consider a scenario where you’re trying to classify squares, circles, and triangles based on pixel values. By squaring the pixel values, quadratic transformations highlight not only the intensity of individual pixels but also the interactions between adjacent pixels. This can make your model more adept at detecting patterns that form the distinctive edges and curves of these shapes, compared to linear transformations.

2. Natural Language Processing — Sentiment Analysis:

In NLP, understanding sentiment goes beyond simple linear relationships. Words often have complex interactions with each other, and quadratic transformations can capture these intricate patterns. For instance, let’s say we’re analysing product reviews for sentiment analysis. Linear transformations might capture simple relationships between individual words and sentiments (e.g., “good” and “bad”). However, quadratic transformations can uncover more nuanced interactions, such as words that intensify or mitigate sentiments (e.g., “not good” versus “very good”). This allows the model to grasp the context and subtle nuances of language.

In both of these examples, quadratic transformations enable neural networks to consider interactions between features that might be crucial for accurate predictions. While linear transformations provide a solid benchmark, quadratic transformations enrich the model’s capacity to capture non-linear dependencies, making it more versatile and adaptable for the use-cases.

Creating Custom Quadratic Layers

Custom layers are the building blocks that enable you to extend the capabilities of deep learning frameworks beyond the pre-defined layers they offer. While standard layers provide essential functionalities like convolutions and dense transformations, custom layers allow you to introduce domain-specific operations, non-linearities, or complex computations tailored to your unique problem.

Mathematically speaking, in a quadratic layer, you’ll take the input features and compute quadratic terms by squaring them. This step introduces the non-linear interactions that can elevate your model’s predictive capabilities. For each feature, the quadratic layer computes its squared value, creating a new feature that represents its interaction with itself.

What makes the custom quadratic layer truly powerful is its ability to learn from data. Just like traditional layers, a custom layer can have learnable weights and biases. These weights control the influence of quadratic terms, allowing the neural network to adjust their importance based on the training data.

import tensorflow as tf
from tensorflow.keras.layers import Layer

class SimpleQuadratic(Layer):
    def __init__(self, units=32, activation=None):
        super(SimpleQuadratic, self).__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, input_shape):
        self.a = self.add_weight(shape=(input_shape[-1], self.units),
                                 initializer='random_normal',
                                 trainable=True,
                                 name='a')
        self.b = self.add_weight(shape=(input_shape[-1], self.units),
                                 initializer='random_normal',
                                 trainable=True,
                                 name='b')
        self.c = self.add_weight(shape=(self.units,),
                                 initializer='zeros',
                                 trainable=True,
                                 name='c')
   
    def call(self, inputs):
        quadratic_terms = tf.matmul(inputs ** 2, self.a) + tf.matmul(inputs, self.b)
        outputs = quadratic_terms + self.c
        if self.activation is not None:
            outputs = self.activation(outputs)
        return outputs

Activation functions like ReLU, sigmoid, or tanh introduce the element of transformation. They decide whether a neuron should fire or remain silent, shaping the output of the layer in response to the input. This step ensures that your custom quadratic layer embodies the essence of neural networks — capturing complex relationships through non-linear transformations.

Practical Implementation

We’ll use the MNIST dataset for the training and evaluation example below.

# Load and preprocess the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the model using the SimpleQuadratic layer
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    SimpleQuadratic(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(x_test, y_test)
print("Test accuracy:", test_acc)

The choice of loss function is relevant for the training process, ensuring that the model learns the appropriate patterns and relationships based on the provided data and labels. Here we are using sparse_categorical_crossentropy. Categorical cross-entropy, in general, calculates the cross-entropy loss for each class independently and sums them up.

To summarize:

Quadratic layers: Modify the architecture to capture non-linear relationships.
Loss functions: Determine how the model’s predictions are compared to the true labels during training and guide the optimization process.

Considerations and Challenges

There are several important considerations and potential challenges to keep in mind:

1. Risk of Overfitting: Quadratic layers introduce additional parameters and complexity to your model. This can increase the risk of overfitting, where the model becomes too specialized to the training data and performs poorly on unseen data. Regularisation techniques such as dropout and L2 regularization can help mitigate this risk.

2. Choosing Appropriate Activation Functions: The choice of activation function in your quadratic layer can impact the model’s behavior and performance. Experiment with different activation functions to find the one that best suits your problem. Consider using common options like ReLU, sigmoid, or tanh.

3. Balancing Interpretability vs Complexity: While quadratic layers can improve model performance by capturing non-linear relationships, they can also make the model less interpretable. Complex interactions between features might be harder to explain. Consider the trade-off between improved accuracy and the ability to interpret the model’s decisions.

Comparing Quadratic Layers with Other Approaches:

Quadratic layers are just one way to capture non-linear interactions.

1. Polynomial Features in Traditional Machine Learning: In traditional ML, you can manually engineer polynomial features by creating new features as combinations of existing features (e.g., squaring and multiplying them). While effective, this approach can lead to a large feature space and the curse of dimensionality. Quadratic layers automate the process of creating and learning quadratic terms, potentially handling higher-dimensional interactions more efficiently.

2. Deep Neural Networks with Multiple Layers: Deep neural networks with multiple layers can capture non-linear relationships effectively by combining multiple linear and non-linear transformations. Quadratic layers introduce specific non-linearities, making them suitable for tasks where quadratic relationships are especially relevant. Additionally, deep networks might require more computational resources and longer training times compared to models with quadratic layers.

Conclusion

Quadratic layers are a specialized tool that can be powerful when dealing with non-linear relationships. However, like any technique, they come with trade-offs. Consider your problem’s nature, the amount of data you have, and the balance between interpretability and complexity before deciding whether to use quadratic layers or opt for other approaches.