Underfitting & Overfitting in ML
Machine Learning · Deep Dive

Underfitting and Overfitting in ML

Every machine learning model must find the sweet spot between learning too little and learning too much. Here's a clear, practical guide to understanding — and fixing — both problems.

📖 7 min read 🏷 ML Fundamentals 📅 2025

Why Models Fail

Machine learning models have one goal: learn patterns from training data and make accurate predictions on data they've never seen. Simple in theory — but two problems can silently undermine even well-designed models.

When a model learns too little, it misses the real patterns in the data — this is underfitting. When it learns too much, it memorises the training data noise and all, and collapses on new data — this is overfitting. The ideal model sits right between the two.

Underfitting Overfitting Bias-Variance Tradeoff Generalisation

When the Model Knows Too Little

Underfitting happens when a model is too simple to capture the real patterns in data. It performs poorly on both training and test data — it simply hasn't learned enough to be useful.

💡 Example

Predicting Temperature Over the Day

Temperature rises in the morning, peaks at noon, then falls — a clear curve. A model that fits a straight line will be wrong at nearly every hour of the day.

Temperature vs Time of Day — Underfitting
Actual Data
Underfitting Line
Ideal Curve

Common Causes

Model too simple Excessive regularisation Weak features Too few training epochs High bias

When the Model Knows Too Much

Overfitting occurs when a model becomes so complex it memorises the training data — including random noise and outliers. It aces training, but fails completely on data it hasn't seen before.

💡 Example

Predicting Shop Sales

A complex model chases every spike and dip in past daily sales, treating random fluctuations as real patterns. It fits training data perfectly but can't predict next week at all.

Sales vs Days — Overfitting
Actual Sales
True Trend
Overfitting Curve

Common Causes

Model too complex Too many features Very little training data No regularisation High variance

Finding the Perfect Balance

The ideal model sits between both extremes. It's complex enough to learn real patterns, but simple enough not to memorise noise. This is the essence of the bias-variance tradeoff.

Property 🔴 Underfitting 🔵 Overfitting
Model complexityToo simpleToo complex
BiasHighLow
VarianceLowHigh
Training accuracyPoorVery high
Test accuracyPoorPoor
Generalises?NoNo

Bias & Variance at a Glance

Underfitting — Bias
High
Underfitting — Variance
Low
Overfitting — Bias
Low
Overfitting — Variance
High
Ideal Model — Bias
Balanced ✓
Ideal Model — Variance
y = ax² + bx + c
A balanced quadratic — captures the curve without memorising noise

How to Fix Both Problems

The right fix depends on which problem your model has. Here are the most effective techniques for each.

🔴 Fix Underfitting

🧠

Use a More Complex Model

Switch to a deeper network, higher-degree polynomial, or ensemble method like Random Forest.

🔬

Add Relevant Features

Engineer new inputs that give the model richer information to learn from.

⏱️

Increase Training Time

Allow more epochs so the model has time to converge on real patterns.

🎛️

Reduce Regularisation

If regularisation is too strong, relax it so the model can learn more freely.

🔵 Fix Overfitting

⚖️

Regularisation (L1 / L2)

Penalise large weights to stop the model from memorising noise.

📊

Increase Training Data

More diverse examples help the model generalise rather than memorise.

🔄

Cross-Validation

Use k-fold validation to verify performance is consistent across data splits.

✂️

Simplify the Model

Trim layers or features so the model can't over-memorise training data.

🛑

Early Stopping

Monitor validation loss during training and stop the moment it starts rising — that's when overfitting begins. One of the simplest and most effective techniques available.


"A model that has truly learned doesn't just memorise — it understands."

Underfitting and overfitting are the two fundamental challenges in machine learning. Master the bias-variance tradeoff through regularisation, better data, and cross-validation — and your models will generalise to the real world with confidence.

Written for learners exploring Machine Learning fundamentals.

Comments

Popular posts from this blog