Encyclopedia ( Tech, Gadgets, Science )

Long Short-Term Memory (LSTM)

📌 What is LSTM?

An LSTM (Long Short-Term Memory) is a special kind of Recurrent Neural Network (RNN) that can learn long-term dependencies in sequential data.
It was introduced by Hochreiter & Schmidhuber in 1997 to solve the vanishing gradient issue in standard RNNs.

While vanilla RNNs struggle to remember information over long sequences, LSTMs use a gating mechanism that helps them decide what to keep, update, or forget.


📌 Why LSTM?

  • Standard RNNs forget long-term context.
  • LSTM introduces memory cells and gates that control information flow.
  • Perfect for tasks where context far back in the sequence matters (e.g., machine translation, speech recognition, time series forecasting).

📊 LSTM Cell Architecture

Each LSTM cell has:

  1. Cell State (CtC_tCt​) → The memory line that runs through the sequence.
  2. Hidden State (hth_tht​) → Output at the current time step.
  3. Gates → Neural networks that regulate information flow.

⚙️ The 3 Main Gates

1️⃣ Forget Gate

  • Decides what information to throw away from the cell state.

ft=σ(Wf⋅[ht−1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)ft​=σ(Wf​⋅[ht−1​,xt​]+bf​)

2️⃣ Input Gate

  • Decides what new information to store.

it=σ(Wi⋅[ht−1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)it​=σ(Wi​⋅[ht−1​,xt​]+bi​) C~t=tanh⁡(Wc⋅[ht−1,xt]+bc)\tilde{C}_t = \tanh(W_c \cdot [h_{t-1}, x_t] + b_c)C~t​=tanh(Wc​⋅[ht−1​,xt​]+bc​)

3️⃣ Output Gate

  • Decides what part of the cell state becomes the output.

ot=σ(Wo⋅[ht−1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)ot​=σ(Wo​⋅[ht−1​,xt​]+bo​) ht=ot∗tanh⁡(Ct)h_t = o_t * \tanh(C_t)ht​=ot​∗tanh(Ct​)


🔄 Cell State Update

Ct=ft∗Ct−1+it∗C~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_tCt​=ft​∗Ct−1​+it​∗C~t​

This equation ensures that useful past information is carried forward, while irrelevant data is forgotten.


📌 Advantages of LSTM

✅ Handles long-term dependencies
✅ Reduces vanishing gradient problem
✅ Works well for variable-length sequences
✅ Widely used in NLP, speech, and forecasting


📌 Applications of LSTM

  • 📝 Text Prediction / Generation → e.g., autocomplete in keyboards
  • 🎤 Speech Recognition → Convert audio into text
  • 📈 Time Series Forecasting → Stock, sales, weather predictions
  • 🎶 Music Generation → Generate melodies
  • 👁️ Video Understanding → Human action recognition

📖 Analogy

Think of LSTM like a smart notepad 📝:

  • It remembers important notes (long-term memory).
  • It erases irrelevant scribbles (forget gate).
  • It writes new important notes (input gate).
  • It decides what to share with others (output gate).

Also Check them

More Terms