Skip to content

Machine Learning Paradigms

Machine learning algorithms learn from data in different ways. The learning paradigm defines how the algorithm receives feedback and improves its performance over time. Understanding these paradigms is fundamental to choosing the right approach for your problem.

The choice of learning paradigm is determined by the type of data you have and the problem you’re trying to solve. Labeled data? Use supervised learning. Unlabeled data? Consider unsupervised or self-supervised. Sequential decision-making? Reinforcement learning is your answer.


Supervised learning is the most common paradigm. The algorithm learns from labeled data—input data paired with the correct output. Think of it as learning with a teacher who provides the answers.

  1. Provide labeled examples: (input, label) pairs
  2. The model learns to map inputs to labels
  3. On new data, the model predicts the label
Data TypeExampleTask Type
Labeled images(cat_image, "cat")Classification
Historical prices(house_features, price)Regression
Email content(email_text, "spam/not_spam")Classification
TypePredictsExamples
ClassificationCategories/labelsSpam detection, image recognition, sentiment analysis
RegressionContinuous valuesHouse prices, temperature forecasting, sales prediction
AlgorithmTypical Use
Linear RegressionSimple regression tasks
Logistic RegressionBinary classification
Decision TreesInterpretable classification/regression
Random ForestsHigh accuracy, robust
Support Vector Machines (SVM)Complex classification boundaries
Neural NetworksComplex, high-dimensional data
  • Email filters: Learn to classify spam vs. not spam from labeled examples
  • Credit scoring: Predict loan default risk from historical customer data
  • Medical diagnosis: Classify tumors as benign/malignant from labeled images
  • Price prediction: Predict house prices from feature data

Limitation: Requires large amounts of labeled data, which can be expensive and time-consuming to create.


Unsupervised learning works with unlabeled data. The algorithm finds patterns, structures, or relationships in the data without being told what to look for.

  1. Provide unlabeled data
  2. The model discovers hidden patterns or structures
  3. Output reveals relationships, groupings, or reduced representations
TypeGoalCommon Algorithms
ClusteringGroup similar items togetherK-Means, Hierarchical Clustering, DBSCAN
Dimensionality ReductionReduce features while preserving informationPCA, t-SNE, UMAP
AssociationDiscover relationships between itemsApriori, FP-Growth
Anomaly DetectionIdentify unusual patternsIsolation Forest, One-Class SVM
  • Customer segmentation: Group customers by purchasing behavior
  • Recommendation systems: Find similar items/users (collaborative filtering)
  • Anomaly detection: Detect fraudulent transactions, network intrusions
  • Data compression: Reduce dataset dimensions for visualization or storage

Advantage: Works with unlabeled data, which is more abundant than labeled data.

Challenge: Evaluating results is harder without ground truth labels.


Self-supervised learning is a clever approach that bridges the gap between supervised and unsupervised learning. It uses unlabeled data but creates its own “labels” from the data structure.

  1. Take unlabeled data
  2. Create a “pretext task” that generates labels from the data itself
  3. Train the model on this self-generated task
  4. The learned representations transfer to downstream tasks
AspectUnsupervised LearningSelf-Supervised Learning
GoalDiscover hidden patterns/structuresLearn useful representations
OutputClusters, reduced dimensionsFeature representations for downstream tasks
Task-drivenNo specific taskHas a specific pretext task
Task TypeDescriptionExample
Masked Language ModelingPredict masked words in textBERT, RoBERTa
Next Token PredictionPredict the next tokenGPT models
Contrastive LearningIdentify similar/dissimilar pairsSimCLR, MoCo
Rotation PredictionPredict image rotation angleSelf-supervised vision models

This paradigm has been crucial for training large language models like GPT-4 and Claude, as well as earlier models like BERT. It allows models to learn from massive amounts of unlabeled data (the entire internet) while still having a clear objective function.

Example: A model trained to predict the next word in a sentence (self-supervised) can later be fine-tuned for sentiment analysis, translation, or summarization (supervised transfer learning).


Reinforcement learning (RL) is about learning through interaction. An agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.

ComponentDescriptionExample
AgentThe learner/decision-makerA robot, a game-playing AI
EnvironmentThe world the agent interacts withA maze, a game board, a simulation
StateCurrent situation of the agentPosition in maze, game configuration
ActionWhat the agent can doMove left/right, jump, shoot
RewardFeedback signal+10 for winning, -1 for each step
  1. Agent observes current state
  2. Agent selects an action
  3. Environment returns new state and reward
  4. Agent updates its policy based on reward
  5. Repeat to maximize cumulative reward
ConceptDescription
PolicyThe strategy the agent uses (mapping from state to action)
Value FunctionExpected future reward from a state
Q-FunctionExpected future reward from a state-action pair
Exploration vs ExploitationBalancing trying new things vs. using known good strategies
AlgorithmTypeNotable Use
Q-LearningValue-basedSimple RL problems
DQNDeep Q-NetworkAtari games
PPOPolicy gradientRobotics, game playing
A3CActor-CriticComplex environments
  • Game playing: AlphaGo (Go), AlphaZero (Chess, Shogi, Go)
  • Robotics: Robots learning to walk, grasp objects
  • Autonomous driving: Learning to navigate traffic
  • Recommendation systems: Optimizing long-term user engagement

Challenge: Requires many interactions; can be sample-inefficient and unstable.


ParadigmData TypeFeedbackUse When…Complexity
SupervisedLabeledDirect answersYou have labeled data and need predictionsLow-Medium
UnsupervisedUnlabeledNoneYou want to discover patterns in unlabeled dataMedium
Self-SupervisedUnlabeledSelf-generatedYou have lots of unlabeled data and want representationsMedium-High
ReinforcementInteractiveRewards/PenaltiesProblem involves sequential decision-makingHigh

  • Supervised Learning: Learn from labeled examples. Use when you have (input, label) pairs.
  • Unsupervised Learning: Discover patterns in unlabeled data. Use for clustering, dimensionality reduction, anomaly detection.
  • Self-Supervised Learning: Generate labels from data structure. Key to training Foundation Models on massive unlabeled datasets.
  • Reinforcement Learning: Learn through trial and error via rewards. Use for sequential decision-making problems (games, robotics).

The paradigm you choose depends on your data, your problem, and your resources.