Skip to content

NLP Embeddings

Embeddings are vector representations of tokens (words or subwords) so machines can work with text. They place tokens in a multi-dimensional space where similar meanings end up closer together.


The biggest evolution in NLP was moving from “one word, one vector” to “one word, many vectors (depending on context)“.

Models: Word2Vec, GloVe, FastText

In these models, a token has a fixed representation regardless of where it appears.

  • How it works: Assigns a single vector to each word in the vocabulary.
  • The Problem: It cannot handle polysemy (words with multiple meanings).
    • Example: The word “bank” has the same vector in “river bank” and “bank deposit”. The model can’t tell the difference.

Models: BERT, RoBERTa, GPT (Transformer-based models)

In these models, the representation of a token changes dynamically based on surrounding tokens.

  • How it works: Uses attention to incorporate context. BERT uses bidirectional attention; GPT uses causal (left-to-right) attention.
  • The Solution: It understands context.
    • Example: The vector for “bank” in “river bank” is completely different from the vector for “bank” in “bank deposit”.

FeatureWord2Vec / GloVeBERT (Transformer)
TypeStatic EmbeddingContextual Embedding
Context Awareness❌ None (Context-independent)✅ High (Context-dependent)
Handling Polysemy❌ Fails (One vector per word)✅ Excellent (Different vectors for same word)
ComputationFast, lightweightSlower, computationally heavy
Best ForSimple analogies, keyword matchingComplex understanding, QA, sentiment analysis

Exam Tip: Contextual embeddings (e.g., BERT, RoBERTa, GPT) differentiate contextual meanings of the same word in different phrases (polysemy).

Exam Tip: Word2Vec and GloVe are static and cannot distinguish context.