Underfitting & Overfitting in ML
Machine Learning · Deep Dive

Underfitting &
Overfitting

The two fundamental forces every ML model must balance — and why getting it right changes everything.

7 min read ML Fundamentals
SCROLL
● Overview

Why Models Fail

Machine Learning models have one job: learn patterns from data and make accurate predictions on new, unseen data. Simple in theory — but two silent enemies lurk in every training run.

High Bias High Variance Bias-Variance Tradeoff Generalization

When a model learns too little, we call it underfitting. When it learns too much — including noise and random quirks — we call it overfitting. The art is finding the sweet spot between the two.

● Underfitting

When the Model Knows Too Little

Underfitting occurs when a model is too simple to capture the real patterns in the data. It performs poorly on both training and testing data — it hasn't learned enough to be useful.

💡 Real-world example

Predicting Temperature Over the Day

Temperature rises in the morning, peaks in the afternoon, then falls — a curve. But if your model forces a straight line, it can never capture that rise-and-fall. The result? Systematically wrong predictions at every point.

TEMPERATURE vs TIME OF DAY — UNDERFITTING
15° 25° 35° 6AM 9AM 12PM 3PM 6PM Actual Data Underfitting

Causes of Underfitting

Model too simple High regularization Weak features Not enough training High bias
● Overfitting

When the Model Knows Too Much

Overfitting happens when a model becomes so complex it memorizes the training data — noise and all. It aces training, but collapses on real-world data it hasn't seen before.

💡 Real-world example

Predicting Shop Sales

A complex model tries to match every spike and drop in daily sales data — treating random fluctuations as meaningful patterns. It gets a perfect score on training data, but fails completely to predict next week's sales.

SALES vs DAYS — OVERFITTING
● Actual Sales ── Overfitting Curve ── True Trend

Causes of Overfitting

Model too complex Too many features Very little data No regularization High variance
● Bias-Variance Tradeoff

Side by Side

Understanding both problems together reveals the core tension in machine learning — known as the bias-variance tradeoff.

🔴 Underfitting

  • Model too simple
  • High bias
  • Low variance
  • Bad on train & test
  • Misses real patterns

🔵 Overfitting

  • Model too complex
  • Low bias
  • High variance
  • Great on train, bad on test
  • Memorizes noise
UNDERFITTING
Bias
Variance
High Bias
OVERFITTING
Bias
Variance
High Variance
PERFECT FIT
Bias
Variance
Balanced ✓
y = ax² + bx + c
A BALANCED QUADRATIC MODEL — COMPLEX ENOUGH, SIMPLE ENOUGH

This concept is the bias-variance tradeoff: underfitting gives you high bias and low variance, overfitting gives you low bias and high variance. The ideal model balances both.

● Solutions

How to Fix Both Problems

Each fix targets a specific root cause — hover any card to see it glow, and watch the live mini-chart update as you toggle between fixes.

🔴 Fix Underfitting

Complexity ↑
🧠

Use a More Complex Model

Switch to higher-degree polynomials, deeper neural nets, or ensemble methods.

Features ↑
🔬

Add Relevant Features

Engineer new inputs that give the model richer information to work with.

adding features
Epochs ↑
⏱️

Increase Training Time

Allow more epochs so the model converges on the real underlying pattern.

epoch
Regularization ↓
🎛️

Reduce Regularization

Loosen constraints that are too tight, letting the model learn more freely.

🔵 Fix Overfitting

L1 / L2
⚖️

Regularization (L1 / L2)

Penalize large weights to stop the model from memorizing noise.

Data ↑
📊

Increase Training Data

Diverse data teaches real patterns rather than random training quirks.

loading data
k-Fold CV
🔄

Cross-Validation

K-fold splits verify consistent performance across all data slices.

Complexity ↓
✂️

Simplify the Model

Trim layers or features so the model can't over-memorize training samples.

pruning nodes…
Stop Before Overfit
🛑

Early Stopping

Monitor validation loss during training. The moment it starts climbing, stop — you've hit the sweet spot before memorization kicks in.

STOP
📈 Live Training vs Validation Loss — Before & After Fix
── Training Loss ── Validation Loss
● Conclusion
"A model that has truly learned doesn't just memorize — it understands."

Underfitting and overfitting are two sides of the same coin. Master the bias-variance tradeoff through regularization, feature selection, and cross-validation — and your models will generalize to the real world.

Comments

Popular posts from this blog