Rolling in the Deep: RBMs

4 min readJul 20, 2019

All you need to know about Restricted Boltzmann Machines

Another type of networks used in deep learning are Restricted Boltzmann Machines (RBM). RBMs are shallow networks used for data reconstruction and feature extraction.

Common applications of RBMs:

• feature extraction
• dimensionality reduction
• pattern recognition
• recommendation systems
• missing values handling
• topic modeling

Architecture

Structurally, an RBM is a shallow neural net with just two layers — the visible layer and the hidden layer. RBM is used for finding patterns and reconstructing the input in an unsupervised manner. The nodes are connected to each other across layers, but no two nodes in the same layer shares a connection, thus RBM is a symmetrical bipartite graph. Symmetrical means that each visible node is connected with each hidden node Bipartite means it has two layers.

Training

The weight matrix is randomly initialized.The training process has three major steps:

Forward pass: the input image is converted to binary values and the vector is fed into the network. In each hidden unit the value are multiplied by the weight and bias matrices. The result goes to the activation function, to get the probability of the node activation to find which neurons may activate. I.e. making stochastic decisions about whether or not to transmit that hidden data

Backward pass: In the backward pass, the activated neurons send the results back to the visible layer, where the input will be reconstructed. The data is combined with the same weights and overall bias that were used in the forward pass. So, once it gets to the visible layer, it is in the shape of the probability distribution of the input values, given the hidden values. So, the backward pass is about making guesses about the probability distribution of the original input.

Assessment: The reconstruction is compared to the original data. The difference between the reconstructed output and the original input is the error, and the weight matrix is tweaked accordingly, until the highest accuracy is achieved. A measure called KL Divergence is used to analyze the accuracy of the net.

All the steps are repeated until the error is minimized.

Advantages of RBMs

Applicable for unlabeled data, as many real-world datasets are unlabeled.
Used for feature extraction,as RBM decides which features are relevant, and how to best combine them to form patterns.
Are more efficient than Principal Component Analysis (PCA) for dimensionality reduction.
In an RBM,the hidden units are conditionally independent given the visible states, so we can quickly get an unbiased sample from the posterior distribution when given a data-vector.

Deep Belief Networks

A deep-belief network (DBN) is a stack of restricted Boltzmann machines. The nodes of any single layer don’t communicate with each other. Essentially, the DBN is trained two layers at a time, and these two layers are treated like an RBM. The hidden layer of an RBM acts as the input layer of the next one, meaning that outputs are then used as inputs to the next RBM. This is repeated until the output layer is reached.

After the training, the DBN is able to recognize the inherent patterns in the data. In other words, it’s a multilayer feature extractor.

In contrast to other types of deep nets, which have a progressively complex patterns, where early layers detect edges and only later edges detect complex patterns, each layer in DBM ends up learning the full input structure, just like a camera slowly bringing an image to focus.

In the end, a DBN requires a set of labels to apply to the resulting patterns, i.e. it is fine-tuned with supervised learning. After making minor tweaks to the weights and biases, the net will achieve a slight increase in accuracy.

DBN is an effective solution to the vanishing gradient problem. As another bonus, it works well with unsupervised learning and only a small set of labelled data is required in the end.

As a final note, RBMs belong to a class of neural nets called autoencoders. An autoencoder is a type of unsupervised learning algorithm that will find patterns in a dataset by detecting key features automatically. Besides feature extraction, autoencoders are also useful for dimensionality reduction. High-dimensional data is a significant problem for machine learning tasks: the time to fit the model increases exponentially with dimensionality. However, if the number of dimensions is small, data can overlap, resulting in information loss. Besides autoencoders, Principal Component Analysis (or PCA) are also used for dimensionality reduction purposes.

However, autoencoder was a breakthrough, as it can extract important features, improve training time and separability compared to other methods.

Rolling in the Deep: RBMs

Architecture

Training

Advantages of RBMs

Deep Belief Networks

Written by Anna Alexandra Grigoryan