Encyclopedia ( Tech, Gadgets, Science )

0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Overfitting vs Underfitting(Machine Learning)

1. Definition

Overfitting
- Model learns the training data too well, including noise and random fluctuations.
- Performs well on training data but poorly on new/unseen data.
- Symptom: High training accuracy, low test accuracy.
Underfitting
- Model is too simple to capture the underlying patterns in data.
- Performs poorly on both training and test data.
- Symptom: Low training accuracy, low test accuracy.

2. Causes

Aspect	Overfitting	Underfitting
Model Complexity	Too complex (e.g., deep trees, many parameters)	Too simple (e.g., linear model for nonlinear data)
Training Data	Too little data, noisy data	Not enough features, poor representation
Training Epochs	Too many training iterations	Too few training iterations
Regularization	None or too weak	Too strong

3. Graphical Representation

Imagine a curve fitting problem:

Underfitting → A straight line through complex data (misses patterns).
Good Fit → Smooth curve that generalizes well.
Overfitting → Wiggly curve that passes through every training point but fails on test data.

4. How to Detect

Use Train vs Test (Validation) performance:
- If train >> test → Overfitting.
- If train ≈ test but both low → Underfitting.
Use Learning Curves (plot accuracy/loss vs training size or epochs).

5. How to Fix

🔹 Fixing Overfitting

Get more training data.
Use simpler model (reduce parameters).
Apply regularization (L1, L2, Dropout).
Use early stopping.
Perform cross-validation.
Data augmentation (for images, text, audio).

🔹 Fixing Underfitting

Use a more complex model (deeper neural net, more features).
Reduce regularization strength.
Train for longer (more epochs).
Add more relevant features or better feature engineering.

6. Example

Overfitting Example:
A decision tree that grows until every leaf is pure → memorizes training data but fails on test data.
Underfitting Example:
Using linear regression to predict a sine wave → can’t capture the curve.

✅ In short:

Overfitting = too much learning, poor generalization.
Underfitting = too little learning, poor representation.
Goal: Find the sweet spot where the model generalizes well.