As a researcher at UCSF Health, I was investigating into embedding models that could best capture medical semantics and information in clinical notes. One of the candidates in my experiment was NVIDIA's NV-Embed-v2, which is currently ranked #2 on the Massive Text Embedding Benchmark (MTEB) (as of Feb 3, 2025)

In this article, I briefly introduce embedding models, some relevant technical details behind NVIDIA’s NV-EMbed-v2, and code that can help you run this embedding model yourself, and visualize it with T-SNE, the right way.

Introduction to Embeddings

The Met Museum of Art, Embedded. 235k+ works of art from one of the best museums in the world. More on Nomic Atlas

Embedding models transform unstructured text into dense numerical vectors that capture semantic meaning. These models enable machines to "understand" language by mapping words, sentences, or even entire documents into continuous vector spaces. We call these vectors "embeddings".

Think of embeddings as a way to turn words and sentences into coordinates on a map. Just as two cities that are close together on a map are likely to be related (like neighboring towns), words or documents with similar meanings end up close together in this mathematical space. For example, the words "doctor" and "nurse" might appear near each other, while "doctor" and "banana" would be far apart.

Once text is embedded, tasks such as similarity search, clustering, and downstream classification become more effective. Today's state‑of‑the‑art approaches—powered by transformer architectures—have set new benchmarks in natural language understanding, and NV‑Embed‑v2 is among the leading models in this domain.

In the medical context, embeddings are particularly valuable for:

Clinical similarity matching - finding similar patient cases based on notes
Automated coding - mapping clinical narratives to standardized medical codes
Cohort identification - grouping patients with similar conditions for research

NV-Embed-v2 achieves a remarkable 72.31 average score across 56 MTEB tasks, outperforming most encoder-based models like E5 and GTE while requiring fewer computational resources during inference. Let’s get to know this model a bit better.

Intro to NV‑Embed‑v2

Traditional text embedding models, such as those based on encoder architectures like BERT or T5, have long dominated semantic representation tasks. However, recent advancements in decoder-only large language models (LLMs) have shifted the paradigm, demonstrating superior performance in generating dense, context-aware embeddings for both retrieval and non-retrieval tasks.

Decoder-based models like NV-Embed-v2 offer practical advantages over traditional encoder models:

Better context understanding: They capture nuanced relationships between terms in complex domain-specific texts