NLP Embeddings

Embeddings are vector representations of tokens (words or subwords) so machines can work with text. They place tokens in a multi-dimensional space where similar meanings end up closer together.

Static vs. Contextual Embeddings

The biggest evolution in NLP was moving from “one word, one vector” to “one word, many vectors (depending on context)“.

1. Static Embeddings (Word2Vec)

Models: Word2Vec, GloVe, FastText

In these models, a token has a fixed representation regardless of where it appears.

How it works: Assigns a single vector to each word in the vocabulary.
The Problem: It cannot handle polysemy (words with multiple meanings).
- Example: The word “bank” has the same vector in “river bank” and “bank deposit”. The model can’t tell the difference.

2. Contextual Embeddings (BERT)

Models: BERT, RoBERTa, GPT (Transformer-based models)

In these models, the representation of a token changes dynamically based on surrounding tokens.

How it works: Uses attention to incorporate context. BERT uses bidirectional attention; GPT uses causal (left-to-right) attention.
The Solution: It understands context.
- Example: The vector for “bank” in “river bank” is completely different from the vector for “bank” in “bank deposit”.

Comparison Table

Feature	Word2Vec / GloVe	BERT (Transformer)
Type	Static Embedding	Contextual Embedding
Context Awareness	❌ None (Context-independent)	✅ High (Context-dependent)
Handling Polysemy	❌ Fails (One vector per word)	✅ Excellent (Different vectors for same word)
Computation	Fast, lightweight	Slower, computationally heavy
Best For	Simple analogies, keyword matching	Complex understanding, QA, sentiment analysis

Exam Tips

Exam Tip: Contextual embeddings (e.g., BERT, RoBERTa, GPT) differentiate contextual meanings of the same word in different phrases (polysemy).

Exam Tip: Word2Vec and GloVe are static and cannot distinguish context.