Hyperparameters

Hyperparameters are the configuration settings you choose (not learned from data) and typically set before training a machine learning model. They control how the model learns, and unlike model parameters, they must be specified by you.

Understanding hyperparameters is crucial because good hyperparameter choices can dramatically improve model performance, while poor ones can lead to underfitting, overfitting, or wasted compute resources.

Hyperparameters vs. Model Parameters

Aspect	Model Parameters	Hyperparameters
Learned from data?	Yes	No (set before training)
When set?	During training	Before training
Purpose	Make predictions	Control learning process
Example	Weights in a neural network	Learning rate, tree depth

Analogy: Model parameters are the chef’s recipe adjustments (learned through cooking). Hyperparameters are the kitchen equipment and cooking method (chosen before starting).

Common Hyperparameters by Model Type

Linear / Logistic Regression

Hyperparameter	What It Controls	Typical Range
C (Regularization)	Inverse of regularization strength	0.001 - 100
Regularization Type	L1 (lasso) vs L2 (ridge)	Choice
Solver	Optimization algorithm	lbfgs, saga, liblinear
Max Iterations	How long to train	100 - 10000

Decision Trees / Random Forests

Hyperparameter	What It Controls	Typical Range
Max Depth	Tree complexity	3 - 20 (None for unlimited)
Min Samples Split	Minimum samples to split a node	2 - 20
Min Samples Leaf	Minimum samples at a leaf node	1 - 20
Max Features	Features considered for each split	√features, log2(features)
N Estimators (Random Forest)	Number of trees	50 - 500

Support Vector Machines

Hyperparameter	What It Controls	Typical Range
C (Regularization)	Tradeoff margin vs. misclassification	0.001 - 1000
Kernel	Decision boundary shape	Linear, RBF, Polynomial
Gamma (RBF)	Influence of single training example	0.001 - 10

Neural Networks

Hyperparameter	What It Controls	Typical Range
Learning Rate	Step size for weight updates	0.00001 - 0.1
Batch Size	Samples per gradient update	16 - 512
Epochs	Number of passes through data	10 - 1000+
Hidden Layers	Network depth	1 - 100+
Units per Layer	Network width	32 - 1024
Activation Function	Non-linearity	ReLU, Tanh, Sigmoid
Dropout Rate	Regularization probability	0.1 - 0.5
Optimizer	Weight update algorithm	SGD, Adam, RMSprop

Gradient Boosting (XGBoost, LightGBM)

Hyperparameter	What It Controls	Typical Range
Learning Rate	Shrinkage of each tree	0.01 - 0.3
N Estimators	Number of boosting rounds	50 - 1000
Max Depth	Tree depth	3 - 10
Subsample	Fraction of samples per tree	0.5 - 1.0
Colsample_bytree	Fraction of features per tree	0.5 - 1.0

Hyperparameter Tuning Methods

Hyperparameter tuning is the process of finding the best combination of hyperparameters for your model.

1. Grid Search

Exhaustively try all combinations from a predefined set of values.

Pros	Cons
Guaranteed to find best in grid	Computationally expensive
Simple to implement	Curse of dimensionality
Reproducible	Inefficient for large search spaces

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, None],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

Use when: Search space is small, you have time for exhaustive search.

2. Random Search

Randomly sample from the hyperparameter space.

Pros	Cons
More efficient than grid search	No guarantee of finding optimal
Better for high-dimensional spaces	Results can vary between runs
Can explore larger spaces	Requires more iterations for coverage

from sklearn.model_selection import RandomizedSearchCV

param_distributions = {
    'n_estimators': [50, 100, 200, 500],
    'max_depth': [5, 10, 15, 20, None],
    'min_samples_split': [2, 5, 10, 15]
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(),
    param_distributions,
    n_iter=50,  # Number of random combinations to try
    cv=5
)
random_search.fit(X_train, y_train)

Use when: Search space is large, you want efficient exploration.

3. Bayesian Optimization

Uses past evaluation results to build a probabilistic model and choose the next hyperparameters intelligently.

Pros	Cons
Sample-efficient	More complex to set up
Finds good results faster	Higher overhead per trial
Good for expensive evaluations	Sensitive to search space definition

Popular libraries: Optuna, Hyperopt, Ray Tune

import optuna

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 500)
    max_depth = trial.suggest_int('max_depth', 5, 20)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 15)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split
    )

    score = cross_val_score(model, X_train, y_train, cv=5).mean()
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

Use when: Model training is expensive, search space is large.

Comparison: Grid vs. Random vs. Bayesian

Method	Efficiency	Best For	Complexity
Grid Search	Low	Small search spaces	Low
Random Search	Medium	Large search spaces	Low
Bayesian Optimization	High	Expensive evaluations	Medium-High

Hyperparameter Tuning Best Practices

1. Start with Defaults

Most libraries provide well-chosen defaults. Start here before extensive tuning.

# Start simple
model = RandomForestClassifier()  # Use defaults

2. Use Cross-Validation

Never tune hyperparameters on the test set—that’s data leakage.

from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X_train, y_train, cv=5)

3. Coarse-to-Fine Search

Coarse: Random search over wide ranges
Narrow: Zoom in on promising regions
Fine: Grid search within narrow ranges

4. Log Scale for Certain Hyperparameters

Some hyperparameters span orders of magnitude:

Hyperparameter	Search On	Why
Learning rate	Log scale (0.001, 0.01, 0.1)	Multiplicative effect
Regularization strength	Log scale (0.001, 0.01, 0.1, 1, 10, 100)	Multiplicative effect
Batch size	Powers of 2 (32, 64, 128, 256)	Memory alignment, practical reasons

5. Tune in Order of Impact

Not all hyperparameters are equally important. This is a rough guide—impact varies by problem:

Often High Impact	Often Medium Impact	Often Lower Impact
Learning rate	Batch size	Weight initialization
Number of trees/estimators	Max depth
Regularization strength	Optimizer type, Min samples split

6. Consider Early Stopping

For iterative algorithms (neural networks, gradient boosting):

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(
    n_iter_no_change=10,  # Stop if no improvement for 10 rounds
    validation_fraction=0.2
)

7. Document Your Experiments

Keep track of what you tried:

import mlflow  # Or Weights & Biases, Neptune, etc.

with mlflow.start_run():
    mlflow.log_params({
        'n_estimators': 100,
        'max_depth': 10,
        'learning_rate': 0.01
    })
    mlflow.log_metric('accuracy', 0.95)

Common Hyperparameter Tuning Mistakes

Mistake	Why It’s Bad	Solution
Tuning on test data	Data leakage, overfitting	Use validation set or cross-validation
Not using defaults first	Wastes time on poor initial choices	Start with defaults, then tune
Tuning in isolation	Misses interactions	Tune important hyperparameters together (use random/Bayesian search)
Too small search space	Miss good configurations	Start broad, then narrow
Not documenting experiments	Can’t reproduce or learn	Track all runs
Ignoring computational cost	Some configs take much longer	Consider time/accuracy tradeoff

TL;DR

Hyperparameters: Configuration settings set before training (not learned from data)
Model Parameters: Internal values learned during training (weights, coefficients)
Tuning Methods:
- Grid search: Exhaustive, simple, slow
- Random search: Efficient, good for large spaces
- Bayesian optimization: Smart, sample-efficient
Best Practices:
- Start with defaults
- Use cross-validation
- Search coarse-to-fine
- Log-scale for learning rate and regularization
- Tune in order of impact
- Document experiments

Hyperparameter tuning is essential for getting the best performance, but always balance improvement against computational cost.

Hyperparameters