LSTM Working

LSTM architecture has a chain structure that contains four neural networks and different memory blocks called cells.

Information is retained by the cells and the memory manipulations are done by the gates. There are three gates – 

Forget Gate

The information that is no longer useful in the cell state is removed with the forget gate. Two inputs xt (input at the particular time) and ht-1 (previous cell output) are fed to the gate and multiplied with weight matrices followed by the addition of bias. The resultant is passed through an activation function which gives a binary output. If for a particular cell state the output is 0, the piece of information is forgotten and for output 1, the information is retained for future use. The equation for the forget gate is:

[Tex] f_t = σ(W_f · [h_{t-1}, x_t] + b_f) [/Tex]
 where:

  • W_f represents the weight matrix associated with the forget gate.
  • [h_t-1, x_t] denotes the concatenation of the current input and the previous hidden state.
  • b_f is the bias with the forget gate.
  • σ is the sigmoid activation function.

Input gate

The addition of useful information to the cell state is done by the input gate. First, the information is regulated using the sigmoid function and filter the values to be remembered similar to the forget gate using inputs ht-1 and xt. . Then, a vector is created using tanh function that gives an output from -1 to +1, which contains all the possible values from ht-1 and xt. At last, the values of the vector and the regulated values are multiplied to obtain the useful information. The equation for the input gate is:

[Tex] i_t = σ(W_i · [h_{t-1}, x_t] + b_i) [/Tex]

[Tex]Ĉ_t = tanh(W_c · [h_{t-1}, x_t] + b_c) [/Tex]

We multiply the previous state by ft, disregarding the information we had previously chosen to ignore. Next, we include it∗Ct. This represents the updated candidate values, adjusted for the amount that we chose to update each state value.

[Tex]C_t = f_t ⊙ C_{t-1} + i_t ⊙ Ĉ_t [/Tex]

where

  •  ⊙ denotes element-wise multiplication
  • tanh is tanh activation function

Output gate

The task of extracting useful information from the current cell state to be presented as output is done by the output gate. First, a vector is generated by applying tanh function on the cell. Then, the information is regulated using the sigmoid function and filter by the values to be remembered using inputs [Tex]h_{t-1} [/Tex]and [Tex]x_t[/Tex]. At last, the values of the vector and the regulated values are multiplied to be sent as an output and input to the next cell. The equation for the output gate is:

[Tex]o_t = σ(W_o · [h_{t-1}, x_t] + b_o) [/Tex]

What is LSTM – Long Short Term Memory?

LSTM excels in sequence prediction tasks, capturing long-term dependencies. Ideal for time series, machine translation, and speech recognition due to order dependence. The article provides an in-depth introduction to LSTM, covering the LSTM model, architecture, working principles, and the critical role they play in various applications.

Similar Reads

What is LSTM?

Long Short-Term Memory is an improved version of recurrent neural network designed by Hochreiter & Schmidhuber....

LSTM Architecture

The LSTM architectures involves the memory cell which is controlled by three gates: the input gate, the forget gate, and the output gate. These gates decide what information to add to, remove from, and output from the memory cell....

LSTM Working

LSTM architecture has a chain structure that contains four neural networks and different memory blocks called cells....

Applications of LSTM

Some of the famous applications of LSTM includes:...

LTSM vs RNN

Feature LSTM (Long Short-term Memory) RNN (Recurrent Neural Network) Memory Has a special memory unit that allows it to learn long-term dependencies in sequential data Does not have a memory unit Directionality Can be trained to process sequential data in both forward and backward directions Can only be trained to process sequential data in one direction Training More difficult to train than RNN due to the complexity of the gates and memory unit Easier to train than LSTM Long-term dependency learning Yes Limited Ability to learn sequential data Yes Yes Applications Machine translation, speech recognition, text summarization, natural language processing, time series forecasting Natural language processing, machine translation, speech recognition, image processing, video processing...

Conclusion

Long Short-Term Memory (LSTM) is a powerful type of recurrent neural network (RNN) that is well-suited for handling sequential data with long-term dependencies. It addresses the vanishing gradient problem, a common limitation of RNNs, by introducing a gating mechanism that controls the flow of information through the network. This allows LSTMs to learn and retain information from the past, making them effective for tasks like machine translation, speech recognition, and natural language processing....

Frequently Asked Questions (FAQs)

1. What is LSTM and why it is used?...