📌 What is LSTM?
An LSTM (Long Short-Term Memory) is a special kind of Recurrent Neural Network (RNN) that can learn long-term dependencies in sequential data.
It was introduced by Hochreiter & Schmidhuber in 1997 to solve the vanishing gradient issue in standard RNNs.
While vanilla RNNs struggle to remember information over long sequences, LSTMs use a gating mechanism that helps them decide what to keep, update, or forget.
📌 Why LSTM?
- Standard RNNs forget long-term context.
- LSTM introduces memory cells and gates that control information flow.
- Perfect for tasks where context far back in the sequence matters (e.g., machine translation, speech recognition, time series forecasting).
📊 LSTM Cell Architecture
Each LSTM cell has:
- Cell State (CtC_tCt) → The memory line that runs through the sequence.
- Hidden State (hth_tht) → Output at the current time step.
- Gates → Neural networks that regulate information flow.
⚙️ The 3 Main Gates
1️⃣ Forget Gate
- Decides what information to throw away from the cell state.
ft=σ(Wf⋅[ht−1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)ft=σ(Wf⋅[ht−1,xt]+bf)
2️⃣ Input Gate
- Decides what new information to store.
it=σ(Wi⋅[ht−1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)it=σ(Wi⋅[ht−1,xt]+bi) C~t=tanh(Wc⋅[ht−1,xt]+bc)\tilde{C}_t = \tanh(W_c \cdot [h_{t-1}, x_t] + b_c)C~t=tanh(Wc⋅[ht−1,xt]+bc)
3️⃣ Output Gate
- Decides what part of the cell state becomes the output.
ot=σ(Wo⋅[ht−1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)ot=σ(Wo⋅[ht−1,xt]+bo) ht=ot∗tanh(Ct)h_t = o_t * \tanh(C_t)ht=ot∗tanh(Ct)
🔄 Cell State Update
Ct=ft∗Ct−1+it∗C~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_tCt=ft∗Ct−1+it∗C~t
This equation ensures that useful past information is carried forward, while irrelevant data is forgotten.
📌 Advantages of LSTM
✅ Handles long-term dependencies
✅ Reduces vanishing gradient problem
✅ Works well for variable-length sequences
✅ Widely used in NLP, speech, and forecasting
📌 Applications of LSTM
- 📝 Text Prediction / Generation → e.g., autocomplete in keyboards
- 🎤 Speech Recognition → Convert audio into text
- 📈 Time Series Forecasting → Stock, sales, weather predictions
- 🎶 Music Generation → Generate melodies
- 👁️ Video Understanding → Human action recognition
📖 Analogy
Think of LSTM like a smart notepad 📝:
- It remembers important notes (long-term memory).
- It erases irrelevant scribbles (forget gate).
- It writes new important notes (input gate).
- It decides what to share with others (output gate).