Skip to content

Hyperparameters

Hyperparameters are the configuration settings you choose (not learned from data) and typically set before training a machine learning model. They control how the model learns, and unlike model parameters, they must be specified by you.

Understanding hyperparameters is crucial because good hyperparameter choices can dramatically improve model performance, while poor ones can lead to underfitting, overfitting, or wasted compute resources.


AspectModel ParametersHyperparameters
Learned from data?YesNo (set before training)
When set?During trainingBefore training
PurposeMake predictionsControl learning process
ExampleWeights in a neural networkLearning rate, tree depth

Analogy: Model parameters are the chef’s recipe adjustments (learned through cooking). Hyperparameters are the kitchen equipment and cooking method (chosen before starting).


HyperparameterWhat It ControlsTypical Range
C (Regularization)Inverse of regularization strength0.001 - 100
Regularization TypeL1 (lasso) vs L2 (ridge)Choice
SolverOptimization algorithmlbfgs, saga, liblinear
Max IterationsHow long to train100 - 10000
HyperparameterWhat It ControlsTypical Range
Max DepthTree complexity3 - 20 (None for unlimited)
Min Samples SplitMinimum samples to split a node2 - 20
Min Samples LeafMinimum samples at a leaf node1 - 20
Max FeaturesFeatures considered for each split√features, log2(features)
N Estimators (Random Forest)Number of trees50 - 500
HyperparameterWhat It ControlsTypical Range
C (Regularization)Tradeoff margin vs. misclassification0.001 - 1000
KernelDecision boundary shapeLinear, RBF, Polynomial
Gamma (RBF)Influence of single training example0.001 - 10
HyperparameterWhat It ControlsTypical Range
Learning RateStep size for weight updates0.00001 - 0.1
Batch SizeSamples per gradient update16 - 512
EpochsNumber of passes through data10 - 1000+
Hidden LayersNetwork depth1 - 100+
Units per LayerNetwork width32 - 1024
Activation FunctionNon-linearityReLU, Tanh, Sigmoid
Dropout RateRegularization probability0.1 - 0.5
OptimizerWeight update algorithmSGD, Adam, RMSprop
HyperparameterWhat It ControlsTypical Range
Learning RateShrinkage of each tree0.01 - 0.3
N EstimatorsNumber of boosting rounds50 - 1000
Max DepthTree depth3 - 10
SubsampleFraction of samples per tree0.5 - 1.0
Colsample_bytreeFraction of features per tree0.5 - 1.0

Hyperparameter tuning is the process of finding the best combination of hyperparameters for your model.

Exhaustively try all combinations from a predefined set of values.

ProsCons
Guaranteed to find best in gridComputationally expensive
Simple to implementCurse of dimensionality
ReproducibleInefficient for large search spaces
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [5, 10, None],
'min_samples_split': [2, 5, 10]
}
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

Use when: Search space is small, you have time for exhaustive search.


Randomly sample from the hyperparameter space.

ProsCons
More efficient than grid searchNo guarantee of finding optimal
Better for high-dimensional spacesResults can vary between runs
Can explore larger spacesRequires more iterations for coverage
from sklearn.model_selection import RandomizedSearchCV
param_distributions = {
'n_estimators': [50, 100, 200, 500],
'max_depth': [5, 10, 15, 20, None],
'min_samples_split': [2, 5, 10, 15]
}
random_search = RandomizedSearchCV(
RandomForestClassifier(),
param_distributions,
n_iter=50, # Number of random combinations to try
cv=5
)
random_search.fit(X_train, y_train)

Use when: Search space is large, you want efficient exploration.


Uses past evaluation results to build a probabilistic model and choose the next hyperparameters intelligently.

ProsCons
Sample-efficientMore complex to set up
Finds good results fasterHigher overhead per trial
Good for expensive evaluationsSensitive to search space definition

Popular libraries: Optuna, Hyperopt, Ray Tune

import optuna
def objective(trial):
n_estimators = trial.suggest_int('n_estimators', 50, 500)
max_depth = trial.suggest_int('max_depth', 5, 20)
min_samples_split = trial.suggest_int('min_samples_split', 2, 15)
model = RandomForestClassifier(
n_estimators=n_estimators,
max_depth=max_depth,
min_samples_split=min_samples_split
)
score = cross_val_score(model, X_train, y_train, cv=5).mean()
return score
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

Use when: Model training is expensive, search space is large.


MethodEfficiencyBest ForComplexity
Grid SearchLowSmall search spacesLow
Random SearchMediumLarge search spacesLow
Bayesian OptimizationHighExpensive evaluationsMedium-High

Most libraries provide well-chosen defaults. Start here before extensive tuning.

# Start simple
model = RandomForestClassifier() # Use defaults

Never tune hyperparameters on the test set—that’s data leakage.

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X_train, y_train, cv=5)
  1. Coarse: Random search over wide ranges
  2. Narrow: Zoom in on promising regions
  3. Fine: Grid search within narrow ranges

Some hyperparameters span orders of magnitude:

HyperparameterSearch OnWhy
Learning rateLog scale (0.001, 0.01, 0.1)Multiplicative effect
Regularization strengthLog scale (0.001, 0.01, 0.1, 1, 10, 100)Multiplicative effect
Batch sizePowers of 2 (32, 64, 128, 256)Memory alignment, practical reasons

Not all hyperparameters are equally important. This is a rough guide—impact varies by problem:

Often High ImpactOften Medium ImpactOften Lower Impact
Learning rateBatch sizeWeight initialization
Number of trees/estimatorsMax depth
Regularization strengthOptimizer type, Min samples split

For iterative algorithms (neural networks, gradient boosting):

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(
n_iter_no_change=10, # Stop if no improvement for 10 rounds
validation_fraction=0.2
)

Keep track of what you tried:

import mlflow # Or Weights & Biases, Neptune, etc.
with mlflow.start_run():
mlflow.log_params({
'n_estimators': 100,
'max_depth': 10,
'learning_rate': 0.01
})
mlflow.log_metric('accuracy', 0.95)

MistakeWhy It’s BadSolution
Tuning on test dataData leakage, overfittingUse validation set or cross-validation
Not using defaults firstWastes time on poor initial choicesStart with defaults, then tune
Tuning in isolationMisses interactionsTune important hyperparameters together (use random/Bayesian search)
Too small search spaceMiss good configurationsStart broad, then narrow
Not documenting experimentsCan’t reproduce or learnTrack all runs
Ignoring computational costSome configs take much longerConsider time/accuracy tradeoff

  • Hyperparameters: Configuration settings set before training (not learned from data)
  • Model Parameters: Internal values learned during training (weights, coefficients)
  • Tuning Methods:
    • Grid search: Exhaustive, simple, slow
    • Random search: Efficient, good for large spaces
    • Bayesian optimization: Smart, sample-efficient
  • Best Practices:
    • Start with defaults
    • Use cross-validation
    • Search coarse-to-fine
    • Log-scale for learning rate and regularization
    • Tune in order of impact
    • Document experiments

Hyperparameter tuning is essential for getting the best performance, but always balance improvement against computational cost.