How Does an RBM Work?
RBMs work by learning the probability distribution of the input data through the interactions between the visible and hidden layers. It learns through an iterative process involving two main phases: the positive phase (reconstruction) and the negative phase (learning). The goal is to adjust the weights and biases to minimize the difference between the input data and its reconstruction.
Positive Phase (Reconstruction)
- Data Point Input: The RBM takes a data point, represented by the activations of the visible units (input features).
- Hidden Layer Activation: Based on the weights and biases, the RBM activates hidden neurons using the visible input. Each hidden node [Tex] h_j[/Tex] calculates its probability of being activated given the visible layer input v. The activation probability of each hidden neuron h_j is given by:
[Tex]P(h_j = 1 | v) = \sigma\left(b_j + \sum_i v_i w_{ij}\right)[/Tex]
where,- [Tex]\sigma [/Tex] is the sigmoid function,
- [Tex]b_j[/Tex] is the bias of the hidden node j,
- [Tex]v_i[/Tex] is the state of the visible node i,
- [Tex]w_{ij}[/Tex] is the weight between the visible node i and the hidden node j.
- Reconstruction: The RBM then reconstructs a new visible layer activation pattern by considering the activity of the hidden layer and the weights between them. The reconstructed visible [Tex]v^{‘}_{i}[/Tex] is given by:
[Tex]P(v_i = 1 | h) = \sigma\left(a_i + \sum_j h_j w_{ij}\right)[/Tex]
where,- [Tex]a_i[/Tex] is the bias of the visible node i.
- Sample Visible States: The visible states are then sampled from this probability distribution, yielding a reconstruction of the original input.
Negative Phase (Learning)
- Reconstructed Input: The reconstructed visible layer activation is then fed back through the network, activating hidden neurons based on the reconstructed data.
- Comparison: The RBM compares the activations of the hidden layer in this phase with the activations from the original input (positive phase). This comparison highlights the discrepancies between the input data and the RBM’s reconstruction.
- Weight Adjustment: Based on this comparison, the weights and biases are adjusted to minimize the difference between the original data and the reconstruction. The update rules for the weights and biases are typically based on the gradient of the reconstruction error. One common method used is Contrastive Divergence (CD), where the weight update is given by:
[Tex]\Delta w_{ij} = \epsilon \left( \langle v_i h_j \rangle_{\text{data}} – \langle v_i h_j \rangle_{\text{recon}} \right) \\ \Delta a_i = \epsilon \left( v_i – v_i’ \right) \\ \Delta b_j = \epsilon \left( h_j – h_j’ \right)[/Tex]
where,- [Tex]\epsilon[/Tex] is the learning rate,
- [Tex]\langle \cdot \rangle_{\text{data}}[/Tex] denotes the expectation under the data distribution,
- [Tex]\langle \cdot \rangle_{\text{recon}}[/Tex] denotes the expectation under the reconstruction distribution.
Restricted Boltzmann Machine : How it works
A Restricted Boltzmann Machine (RBM), Introduced by Geoffrey Hinton and Terry Sejnowski in 1985, Since, It become foundational in unsupervised machine learning, particularly in the context of deep learning architectures. They are widely used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modelling.