Skip to content

Model Fit: Bias and Variance

The bias-variance tradeoff is a fundamental concept in machine learning that explains why models fail to generalize. Every machine learning practitioner must understand this tradeoff to build models that perform well on unseen data.

There’s no free lunch in machine learning. For a fixed model complexity, reducing bias often increases variance and vice versa—you must find the sweet spot where total error is minimized.


When a model makes predictions, the total error can be decomposed into three parts (for squared error / MSE):

Total Error = Bias² + Variance + Irreducible Error
ComponentWhat It Represents
Bias²Error from overly simplistic assumptions
VarianceError from sensitivity to small fluctuations in training data
Irreducible ErrorNoise in the data that no model can eliminate

Bias is the error introduced by approximating a real-world problem with a simplified model.

A model with high bias makes strong assumptions about the data and fails to capture underlying patterns.

CharacteristicsExamples
Poor performance on training dataLinear model on curved data
Poor performance on test dataDecision tree with depth=1
Model is “too simple”Assuming linear relationship when it’s quadratic
Data Points: •
• •
• • •
• • • • •
Model (Line): ──────────────────
The straight line can't capture the curve.
Result: High training error, high test error.

Variance is the model’s sensitivity to fluctuations in the training set. A high-variance model changes dramatically with small changes in the training data.

A model with high variance memorizes the training data, including noise and outliers.

CharacteristicsExamples
Excellent performance on training dataDecision tree with depth=20
Poor performance on test data20th-degree polynomial on 50 points
Model is “too complex”Neural network with too many parameters
Data Points: •
• •
• • •
• • • • •
Model (Curve): ╱╲╱╲╱╲╱╲╱╲╱╲
The curve passes through every point, including noise.
Result: Low training error, high test error.

As model complexity increases:

  • Bias decreases (model becomes more flexible)
  • Variance increases (model becomes more sensitive to training data)
Error
│ ┌────────────────┐
│ │ Variance │ Total Error
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲
│ ╱ Bias² ╲──────────
│╱ ╲
└──────────────────────────────────────────→ Model Complexity
Simple Complex

The optimal model complexity balances bias and variance to minimize total error:

Model Too SimpleModel Just RightModel Too Complex
High bias, low varianceBalanced bias and varianceLow bias, high variance
UnderfittingGood fitOverfitting

Use training and validation performance to diagnose:

ScenarioTraining ErrorValidation ErrorDiagnosis
HighHighHighHigh Bias (Underfitting)
LowHighHigh Variance (Overfitting)
LowLowGood Fit

If your model is underfitting (high bias):

TechniqueHow It Helps
Add more featuresGives model more information to work with
Increase model complexityMore layers, deeper trees, higher polynomial degree
Reduce regularizationLet model fit data more closely
Use a different algorithmSome models are inherently more expressive
# Underfitting: Linear regression on non-linear data
model = LinearRegression() # High bias
# Solution: Use polynomial features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)
model = LinearRegression() # Lower bias

If your model is overfitting (high variance):

TechniqueHow It Helps
Get more training dataReduces the impact of noise
Remove featuresFewer features = less complexity
RegularizationPenalizes complex models (L1/L2, dropout)
Cross-validationBetter estimate of true performance
Ensemble methodsCombine multiple models (bagging, random forests)
Early stoppingStop training before memorization
TypePenaltyEffect
L1 (Lasso)`weight
L2 (Ridge)weight²Reduces all weights (shrinks coefficients)
ElasticNetBothCombines L1 and L2 benefits
Dropout (neural nets)Randomly disable neuronsPrevents co-adaptation
# Overfitting: Decision tree with no depth limit
model = DecisionTreeRegressor() # High variance
# Solution 1: Limit tree depth
model = DecisionTreeRegressor(max_depth=5)
# Solution 2: Use regularization (Ridge regression)
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0) # alpha controls regularization strength

Cross-validation is your best tool for finding the bias-variance sweet spot.

Dataset: |----|----|----|----|----| (5 folds)
Fold 1: Train on [2,3,4,5], Validate on [1]
Fold 2: Train on [1,3,4,5], Validate on [2]
Fold 3: Train on [1,2,4,5], Validate on [3]
...

Benefits:

  • More reliable estimate of model performance
  • Helps detect overfitting early
  • Essential for hyperparameter tuning

Plotting training and validation error vs. training set size reveals bias/variance issues:

Error
│ Training ──────────────────
│ ╱
│ ╱
│ ╱
│ Validation ─────────╲────
│ ╲
└─────────────────────────────→ Training Set Size
  • Large gap between curves: High variance (training low, validation high)
  • Both curves high and close: High bias
  • Both low and converging: Good fit

  1. Begin with a simple model (linear regression, shallow tree)
  2. Measure training/validation performance
  3. If high bias: increase complexity
  4. If high variance: regularize or get more data
Data SizePreferred Approach
Small (<1K samples)Simple models, strong regularization
Medium (1K-100K)Medium complexity, cross-validation
Large (>100K)Deep learning, ensemble methods

Sometimes you can’t achieve perfect balance:

SituationAcceptable Tradeoff
Limited dataSlightly higher bias (simpler model)
Interpretability requiredHigher bias (linear models)
Max prediction accuracyMinimize total error, regardless of bias/variance

  • Bias: Error from overly simple models (underfitting)
  • Variance: Error from overly complex models (overfitting)
  • Tradeoff: You can’t minimize both—find the sweet spot
  • Diagnose: Compare training vs. validation error
  • Fix high bias: Add features, increase complexity, reduce regularization
  • Fix high variance: Get more data, add regularization, use ensembles
  • Cross-validation: Essential for finding the optimal balance

The bias-variance tradeoff is about finding the right model complexity—complex enough to learn patterns, simple enough to generalize.