Encyclopedia ( Tech, Gadgets, Science )

Overfitting vs Underfitting(Machine Learning)

1. Definition

  • Overfitting
    • Model learns the training data too well, including noise and random fluctuations.
    • Performs well on training data but poorly on new/unseen data.
    • Symptom: High training accuracy, low test accuracy.
  • Underfitting
    • Model is too simple to capture the underlying patterns in data.
    • Performs poorly on both training and test data.
    • Symptom: Low training accuracy, low test accuracy.

2. Causes

AspectOverfittingUnderfitting
Model ComplexityToo complex (e.g., deep trees, many parameters)Too simple (e.g., linear model for nonlinear data)
Training DataToo little data, noisy dataNot enough features, poor representation
Training EpochsToo many training iterationsToo few training iterations
RegularizationNone or too weakToo strong

3. Graphical Representation

Imagine a curve fitting problem:

  • Underfitting → A straight line through complex data (misses patterns).
  • Good Fit → Smooth curve that generalizes well.
  • Overfitting → Wiggly curve that passes through every training point but fails on test data.

4. How to Detect

  • Use Train vs Test (Validation) performance:
    • If train >> test → Overfitting.
    • If train ≈ test but both low → Underfitting.
  • Use Learning Curves (plot accuracy/loss vs training size or epochs).

5. How to Fix

🔹 Fixing Overfitting

  • Get more training data.
  • Use simpler model (reduce parameters).
  • Apply regularization (L1, L2, Dropout).
  • Use early stopping.
  • Perform cross-validation.
  • Data augmentation (for images, text, audio).

🔹 Fixing Underfitting

  • Use a more complex model (deeper neural net, more features).
  • Reduce regularization strength.
  • Train for longer (more epochs).
  • Add more relevant features or better feature engineering.

6. Example

  • Overfitting Example:
    A decision tree that grows until every leaf is pure → memorizes training data but fails on test data.
  • Underfitting Example:
    Using linear regression to predict a sine wave → can’t capture the curve.

In short:

  • Overfitting = too much learning, poor generalization.
  • Underfitting = too little learning, poor representation.
  • Goal: Find the sweet spot where the model generalizes well.

Also Check them

More Terms