Encyclopedia ( Tech, Gadgets, Science )

Backpropagation

📌 What is Backpropagation?

Backpropagation (Backward Propagation of Errors) is the algorithm used to train neural networks by adjusting weights to minimize the error (loss).

It’s essentially an application of calculus (chain rule) + optimization (gradient descent).

  • Forward pass → compute predictions & loss
  • Backward pass → compute gradients (partial derivatives) of loss w.r.t. each weight
  • Update weights → using gradient descent

⚙️ Steps of Backpropagation

  1. Forward Propagation
    • Input flows through the network.
    • Output is computed.
    • Loss is calculated (e.g., MSE, Cross-Entropy).
  2. Backward Propagation
    • Calculate the gradient of the loss function with respect to each parameter (weight & bias).
    • Apply chain rule layer by layer (starting from output → input).
  3. Weight Update (Gradient Descent)
    • Update rule: wnew=wold−η⋅∂L∂ww_{new} = w_{old} – \eta \cdot \frac{\partial L}{\partial w}wnew​=wold​−η⋅∂w∂L​ where:
      • η\etaη = learning rate
      • ∂L∂w\frac{\partial L}{\partial w}∂w∂L​ = gradient of loss w.r.t. weight

📖 Simple Example

Imagine a 1-layer neural network:

  • Input x=2x=2x=2, weight w=0.5w=0.5w=0.5, bias b=0.1b=0.1b=0.1
  • Target output y=1y=1y=1
  • Prediction: y^=wx+b=(0.5)(2)+0.1=1.1\hat{y} = wx + b = (0.5)(2) + 0.1 = 1.1y^​=wx+b=(0.5)(2)+0.1=1.1
  • Loss (MSE): L=12(y^−y)2=12(1.1−1)2=0.005L = \frac{1}{2}(\hat{y} – y)^2 = \frac{1}{2}(1.1 – 1)^2 = 0.005L=21​(y^​−y)2=21​(1.1−1)2=0.005
  • Gradient w.r.t weight: ∂L∂w=(y^−y)⋅x=(0.1)(2)=0.2\frac{\partial L}{\partial w} = ( \hat{y} – y ) \cdot x = (0.1)(2) = 0.2∂w∂L​=(y^​−y)⋅x=(0.1)(2)=0.2
  • Update weight (learning rate = 0.1): wnew=0.5−0.1⋅0.2=0.48w_{new} = 0.5 – 0.1 \cdot 0.2 = 0.48wnew​=0.5−0.1⋅0.2=0.48

👉 The weight moves slightly closer to reducing error. This repeats over many iterations until convergence.


🔑 Importance of Backpropagation

✅ Makes deep learning possible
✅ Efficiently computes gradients (instead of brute force)
✅ Works with any differentiable activation & loss function
✅ Powers training of CNNs, RNNs, Transformers, etc.


🚧 Challenges

  • Vanishing/Exploding Gradients (common in deep RNNs)
  • Local minima / saddle points in loss surface
  • Overfitting if not enough data/regularization

Solutions: ReLU, Batch Normalization, Residual Connections, Gradient Clipping, etc.


✅ In short: Backpropagation is the algorithm that lets a neural network learn from its mistakes by propagating error backwards and updating weights efficiently.

Also Check them

More Terms