Retrieval-augmented regeneration, also known as RAG, is an NLP technique that can help improve the quality of large language models (LLMs). It allows your AI agent to retrieve data from external sources to generate grounded responses. This helps avoid that your agent hallucinates and returns incorrect information.
There are multiple ways to retrieve this external data. In this post I want to show you how to generate vector embeddings that can be stored and retrieved from a vector database. This means we first need to decide which database to use. The list of options keeps growing (even the new SQL server version will support vector embeddings out of the box). As we want to demonstrate this feature using Semantic Kernel, we need to take a look at one of the available connectors.
Qdrant Vector Database
I decided to use Qdrant for this blog post.
Qdrant is an AI-native vector database and a semantic search engine. You can use it to extract meaningful information from unstructured data.
You can run Qdrant locally using docker. I have a local /data/qdrant folder that qdrant should use to store the data
docker run --name qdrantdemo -p 6333:6333 -p 6334:6334 -v "/data/qdrant:/qdrant/storage" qdrant/qdrant
If you browse to http://localhost:6333/dashboard you get a web interface to explore the data in the vector store:
Remark: Different vector stores expect the vectors in different formats and sizes. So if you want to use the code I will show you in this post with another Vector database, you probably will need to make some changes.
Ollama Text Embeddings
To generate our embeddings, we need to use a text embedding generator. Ollama supports multiple embedding models, I decided to install the ‘nomic-embed-text’ model:
ollama pull nomic-embed-text
Remark: For a good introduction about different embeddings, check out this post.
Now we can move to the code and start by adding the following NuGet packages:
dotnet add package Microsoft.SemanticKernel.Connectors.Ollama
dotnet add package Microsoft.SemanticKernel.Connectors.Qdrant
Once these packages are installed, we can create a new OllamaApiClient
instance referencing the text embedding model we have installed above.
We can also directly generate an ITextEmbeddingGenerationService:
Now we can take some data and convert it to a vector by using the following code:
Bringing it all together
Almost there! The last step is get this generated embedding into our vector database.
Therefore we first need to generate a Qdrant VectorStore object:
We also need to create a model that describes how a vector store object in Qdrant should look like by correctly annotating it:
Once we have that model defined, we can create a new collection using this model. Notice that I’m specifying a Guid as the key(ulong values are also supported):
Now we can use this to upload the example data:
That’s it for today! In a follow-up post I’ll show you how we can integrate this information in our Agent and have a complete RAG solution.
More information
wullemsb/SemanticKernel at RAG
Retrieve data from plugins for RAG | Microsoft Learn
Generating embeddings for Semantic Kernel Vector Store connectors | Microsoft Learn
Qdrant - Vector Database - Qdrant
Using the Semantic Kernel Qdrant Vector Store connector (Preview) | Microsoft Learn