Machine Learning Paradigms
Machine learning algorithms learn from data in different ways. The learning paradigm defines how the algorithm receives feedback and improves its performance over time. Understanding these paradigms is fundamental to choosing the right approach for your problem.
The choice of learning paradigm is determined by the type of data you have and the problem you’re trying to solve. Labeled data? Use supervised learning. Unlabeled data? Consider unsupervised or self-supervised. Sequential decision-making? Reinforcement learning is your answer.
Supervised Learning
Section titled “Supervised Learning”Supervised learning is the most common paradigm. The algorithm learns from labeled data—input data paired with the correct output. Think of it as learning with a teacher who provides the answers.
How It Works
Section titled “How It Works”- Provide labeled examples:
(input, label)pairs - The model learns to map inputs to labels
- On new data, the model predicts the label
| Data Type | Example | Task Type |
|---|---|---|
| Labeled images | (cat_image, "cat") | Classification |
| Historical prices | (house_features, price) | Regression |
| Email content | (email_text, "spam/not_spam") | Classification |
Types of Supervised Learning
Section titled “Types of Supervised Learning”| Type | Predicts | Examples |
|---|---|---|
| Classification | Categories/labels | Spam detection, image recognition, sentiment analysis |
| Regression | Continuous values | House prices, temperature forecasting, sales prediction |
Common Algorithms
Section titled “Common Algorithms”| Algorithm | Typical Use |
|---|---|
| Linear Regression | Simple regression tasks |
| Logistic Regression | Binary classification |
| Decision Trees | Interpretable classification/regression |
| Random Forests | High accuracy, robust |
| Support Vector Machines (SVM) | Complex classification boundaries |
| Neural Networks | Complex, high-dimensional data |
Real-World Examples
Section titled “Real-World Examples”- Email filters: Learn to classify spam vs. not spam from labeled examples
- Credit scoring: Predict loan default risk from historical customer data
- Medical diagnosis: Classify tumors as benign/malignant from labeled images
- Price prediction: Predict house prices from feature data
Limitation: Requires large amounts of labeled data, which can be expensive and time-consuming to create.
Unsupervised Learning
Section titled “Unsupervised Learning”Unsupervised learning works with unlabeled data. The algorithm finds patterns, structures, or relationships in the data without being told what to look for.
How It Works
Section titled “How It Works”- Provide unlabeled data
- The model discovers hidden patterns or structures
- Output reveals relationships, groupings, or reduced representations
Types of Unsupervised Learning
Section titled “Types of Unsupervised Learning”| Type | Goal | Common Algorithms |
|---|---|---|
| Clustering | Group similar items together | K-Means, Hierarchical Clustering, DBSCAN |
| Dimensionality Reduction | Reduce features while preserving information | PCA, t-SNE, UMAP |
| Association | Discover relationships between items | Apriori, FP-Growth |
| Anomaly Detection | Identify unusual patterns | Isolation Forest, One-Class SVM |
Real-World Examples
Section titled “Real-World Examples”- Customer segmentation: Group customers by purchasing behavior
- Recommendation systems: Find similar items/users (collaborative filtering)
- Anomaly detection: Detect fraudulent transactions, network intrusions
- Data compression: Reduce dataset dimensions for visualization or storage
Advantage: Works with unlabeled data, which is more abundant than labeled data.
Challenge: Evaluating results is harder without ground truth labels.
Self-Supervised Learning
Section titled “Self-Supervised Learning”Self-supervised learning is a clever approach that bridges the gap between supervised and unsupervised learning. It uses unlabeled data but creates its own “labels” from the data structure.
How It Works
Section titled “How It Works”- Take unlabeled data
- Create a “pretext task” that generates labels from the data itself
- Train the model on this self-generated task
- The learned representations transfer to downstream tasks
Key Difference from Unsupervised Learning
Section titled “Key Difference from Unsupervised Learning”| Aspect | Unsupervised Learning | Self-Supervised Learning |
|---|---|---|
| Goal | Discover hidden patterns/structures | Learn useful representations |
| Output | Clusters, reduced dimensions | Feature representations for downstream tasks |
| Task-driven | No specific task | Has a specific pretext task |
Common Pretext Tasks
Section titled “Common Pretext Tasks”| Task Type | Description | Example |
|---|---|---|
| Masked Language Modeling | Predict masked words in text | BERT, RoBERTa |
| Next Token Prediction | Predict the next token | GPT models |
| Contrastive Learning | Identify similar/dissimilar pairs | SimCLR, MoCo |
| Rotation Prediction | Predict image rotation angle | Self-supervised vision models |
Why Self-Supervised Learning Matters
Section titled “Why Self-Supervised Learning Matters”This paradigm has been crucial for training large language models like GPT-4 and Claude, as well as earlier models like BERT. It allows models to learn from massive amounts of unlabeled data (the entire internet) while still having a clear objective function.
Example: A model trained to predict the next word in a sentence (self-supervised) can later be fine-tuned for sentiment analysis, translation, or summarization (supervised transfer learning).
Reinforcement Learning
Section titled “Reinforcement Learning”Reinforcement learning (RL) is about learning through interaction. An agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.
Core Components
Section titled “Core Components”| Component | Description | Example |
|---|---|---|
| Agent | The learner/decision-maker | A robot, a game-playing AI |
| Environment | The world the agent interacts with | A maze, a game board, a simulation |
| State | Current situation of the agent | Position in maze, game configuration |
| Action | What the agent can do | Move left/right, jump, shoot |
| Reward | Feedback signal | +10 for winning, -1 for each step |
How It Works
Section titled “How It Works”- Agent observes current state
- Agent selects an action
- Environment returns new state and reward
- Agent updates its policy based on reward
- Repeat to maximize cumulative reward
Key Concepts
Section titled “Key Concepts”| Concept | Description |
|---|---|
| Policy | The strategy the agent uses (mapping from state to action) |
| Value Function | Expected future reward from a state |
| Q-Function | Expected future reward from a state-action pair |
| Exploration vs Exploitation | Balancing trying new things vs. using known good strategies |
Common Algorithms
Section titled “Common Algorithms”| Algorithm | Type | Notable Use |
|---|---|---|
| Q-Learning | Value-based | Simple RL problems |
| DQN | Deep Q-Network | Atari games |
| PPO | Policy gradient | Robotics, game playing |
| A3C | Actor-Critic | Complex environments |
Real-World Examples
Section titled “Real-World Examples”- Game playing: AlphaGo (Go), AlphaZero (Chess, Shogi, Go)
- Robotics: Robots learning to walk, grasp objects
- Autonomous driving: Learning to navigate traffic
- Recommendation systems: Optimizing long-term user engagement
Challenge: Requires many interactions; can be sample-inefficient and unstable.
Quick Comparison
Section titled “Quick Comparison”| Paradigm | Data Type | Feedback | Use When… | Complexity |
|---|---|---|---|---|
| Supervised | Labeled | Direct answers | You have labeled data and need predictions | Low-Medium |
| Unsupervised | Unlabeled | None | You want to discover patterns in unlabeled data | Medium |
| Self-Supervised | Unlabeled | Self-generated | You have lots of unlabeled data and want representations | Medium-High |
| Reinforcement | Interactive | Rewards/Penalties | Problem involves sequential decision-making | High |
- Supervised Learning: Learn from labeled examples. Use when you have
(input, label)pairs. - Unsupervised Learning: Discover patterns in unlabeled data. Use for clustering, dimensionality reduction, anomaly detection.
- Self-Supervised Learning: Generate labels from data structure. Key to training Foundation Models on massive unlabeled datasets.
- Reinforcement Learning: Learn through trial and error via rewards. Use for sequential decision-making problems (games, robotics).
The paradigm you choose depends on your data, your problem, and your resources.