📌 Definition
Gradient Descent is an optimization algorithm used to minimize the loss (or cost) function of a model by updating its parameters (weights & biases) in the direction of the steepest descent of the error surface.
It’s like trying to climb down a hill blindfolded by taking small steps in the direction of the steepest slope until you reach the valley (minimum error).
⚙️ How It Works
- Start with initial parameters (weights).
- Compute the loss function (how wrong the model is).
- Calculate the gradient (derivative) of the loss function w.r.t. each parameter.
- Update parameters using:
θ=θ−η⋅∇L(θ)\theta = \theta – \eta \cdot \nabla L(\theta)θ=θ−η⋅∇L(θ)
Where:
- θ\thetaθ = model parameters (weights, biases)
- η\etaη = learning rate (step size)
- ∇L(θ)\nabla L(\theta)∇L(θ) = gradient of loss function
- Repeat until convergence (loss becomes minimal or stable).
📊 Types of Gradient Descent
- Batch Gradient Descent
- Uses the entire dataset to compute the gradient in each step.
- Very accurate but slow for large datasets.
- Stochastic Gradient Descent (SGD)
- Uses one random sample per iteration to update weights.
- Much faster, but noisy (loss curve fluctuates).
- Mini-Batch Gradient Descent (most common in Deep Learning)
- Uses a small batch of samples per update.
- Balances speed & accuracy.
✅ Advantages
- Simple and widely used in ML/DL.
- Works well for high-dimensional data.
- Flexible (different variants for efficiency).
⚠️ Disadvantages
- Choosing the right learning rate (η\etaη) is critical:
- Too small → very slow convergence.
- Too large → may overshoot and diverge.
- Can get stuck in local minima (though deep learning often deals with saddle points more).
🔄 Variants with Improvements
- Momentum – adds a velocity term to smooth updates.
- RMSProp – adapts learning rate per parameter.
- Adam – combines Momentum + RMSProp (most popular).
💡 Analogy:
Imagine you’re standing on top of a hill (high error). You want to reach the bottom (minimum error). You can’t see the whole landscape, but by feeling the slope under your feet, you decide which direction to step in. Step too big → you might stumble past the valley; step too small → it will take forever.