Skip to main content

Text Embeddings

Embeddings: The Essence of Textual Meaning

Imagine a world where language can be mapped into a numerical landscape. That's the realm of embeddings – they transform words and phrases into vectors, lists of numbers that capture the core essence of their meaning.

Why Embeddings Matter

  • Search Reimagined: Embeddings revolutionize search. Instead of simple keyword matching, they find results truly relevant to your query, even if the exact phrasing is different.
  • Clustering Made Easy: Embeddings group similar text together, making it simple to organize vast amounts of text data. Think of them as an automatic sorting system for your content.
  • Recommendations with Insight: Recommend products, articles, or content that match a user's interests based on the similarity of their embeddings.
  • Anomaly Spotter: Embeddings can flag unusual text that stands out from the crowd, aiding in anomaly detection for applications like fraud prevention.
  • Understanding Diversity: Analyze the distribution of embeddings to measure the diversity of text within datasets or identify potential biases.
  • Smart Classification: Assign labels to text based on their closest matching embedding, automating tasks like document categorization.

The Power of Distance

The secret of embeddings lies in the distances between these numerical vectors:

  • Close together: High similarity, closely related concepts.
  • Far apart: Low similarity, meaning the texts likely discuss very different topics.

Getting Your Hands on Embeddings

GenAI's embeddings API makes this powerful technology accessible:

  1. Send your text: Simply provide your text string to the API along with the desired embedding model (e.g., Mistral).
  2. Receive the vector: The API returns a detailed response containing the embedding vector and metadata.
  3. Unlock insights: Store your embeddings in a vector database and analyze them for various tasks.

Example: Let's see it in action!

API Request (using Python):

from mcp import GenAI
client = GenAI(api_key="<proxy-api-key>", base_url="https://api-genai.app-nonprod.mcp.org/")

response = client.embeddings.create(
input=["Hello from MCP GenAI"],
model="nomic-embed-text"
)

print(response)

API Request (using curl):

curl https://api.genai.mcp.org/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MCP_GENAI_API_KEY" \
-d '{
"input": "Hello from MCP GenAI",
"model": "nomic-embed-text"
}'

Example Response (JSON):

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0069..., -0.0053..., ... (more numbers) ..., -0.0240...
],
}
],
"model": "mistral",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}