Skip to main content

Qdrant Vector Database–What you need to know to get started in .NET

A key element in building an RAG(Retrieval Augmented Generation) is the usage of a vector database. The list of vector databases is growing every day and even SQL Server now supports vectors. In this post I focus on Qdrant (pronounced as "quadrant") a vector similarity search engine and database written in Rust. It's specifically designed to handle vector embeddings and payload, making it perfect for modern machine learning applications.

The goal of this post is not to make you a Qdrant expert(neither am I) but to provide you with enough information to start using it correctly.

What is Qdrant?

Qdrant is an AI-native vector database and semantic search engine designed to handle high-dimensional vectors. It provides a production-ready service with a convenient API to store, search, and manage vectors with additional payloads. 

A vector database is designed to store and query high-dimensional vectors efficiently. Unlike traditional databases that organize data in rows and columns, vector databases handle data represented as vectors in a high-dimensional space. These vectors are mathematical representations of objects or data points, where each element corresponds to specific features or attributes. Vector databases are optimized for applications like image recognition, natural language processing, and recommendation systems, enabling fast similarity and semantic search by using specialized indexing techniques

This makes a vector database a perfect fit in RAG scenarios. 

Installing Qdrant

Setting up Qdrant is straightforward. You can run it using Docker:

docker pull qdrant/qdrant docker run -p 6333:6333 qdrant/qdrant

Or download and install it directly on your system using a release found here:

https://github.com/qdrant/qdrant/releases/

After deploying Qdrant locally, you can access the Web UI using the following URI: http://localhost:6333/dashboard

 


We come back to this Web UI later, but let me first explain the basic building blocks in Qdrant.

The basic building blocks

The main building blocks you need to understand are:

  • Collection: A named set of points that you can use for your search.
  • Point: A record which consists of a vector and an optional payload.
  • Distance Metrics: The method used to measure similarities among vectors.

Collections

A collection is a named set of points (vectors with a payload) among which you can search. You can compare a collection with a table in relational database.  The vector of each point within the same collection must have the same dimensionality and be compared by a single metric.

Qdrant supports two modes for vector storage. The default mode is single unnamed vector. In this case a collection may only contain a single vector and it will be unnamed in the storage model in Qdrant. A second mode is named vector. By using this mode it is possible to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements.

Remark: Semantic Kernel can handle both unnamed and named vector modes.

Points

The points are the central entity that Qdrant operates with and they consist of a vector and an optional id and payload.
  • id: a unique identifier for your vectors.
  • Vector: a high-dimensional representation of data, for example, an image, a sound, a document, a video, etc.
  • Payload: A payload is a JSON object with additional data you can add to a vector.

Every vector has a specific size that is defined at the collection level. The size typically depends on the used embedding engine as different engines generate vectors in different sizes.

Remark: Qdrant only supports only ulong or guid id’s. Choosing a different id type in Semantic Kernel will result in errors.

Distance Metrics

Distance metrics are used to measure similarities among vectors and they must be selected at the same time you are creating a collection. The choice of metric depends on the way the vectors were obtained and, in particular, on the neural network that will be used to encode new queries.

There are a lot of distance metrics, but Qdrant supports only some of the most popular ones:

  • CosineSimilarity
  • DotProductSimilarity
  • EuclideanDistance
  • ManhattanDistance

Here is a short explanation of each of them :

  • Cosine Similarity - Cosine similarity is a way to measure how similar two vectors are. To simplify, it reflects whether the vectors have the same direction (similar) or are poles apart. Cosine similarity is often used with text representations to compare how similar two documents or sentences are to each other.
  • Dot Product - The dot product similarity metric is another way of measuring how similar two vectors are. Unlike cosine similarity, it also considers the length of the vectors. This might be important when, for example, vector representations of your documents are built based on the term (word) frequencies.
  • Euclidean Distance - Euclidean distance is a way to measure the distance between two points in space, similar to how we measure the distance between two places on a map.
  • Manhattan DistanceManhattan distance is similar to the Euclidean in the sense that it also is used to measure the distance between two points in space, the difference is that it is using a grid-like path by summing the absolute differences of their corresponding coordinates.

Remark: When using Semantic Kernel and you don’t explicitly specify a distance metric/function, Cosine Similarity will be used by default.

Creating and searching our first data in Qdrant

Let’s now apply the information above to create our first collection in Qdrant. Therefore let’s go back to the Web UI available at http://localhost:6333/dashboard.

Click on the Quickstart button to open the Quickstart tutorial and let us walk through the steps.

First we need to create a collection to store our data. In this example a collection called star_charts will be created. This collection will store location data where each location will be represented by a vector of four dimensions, and we'll use the Dot product as the distance metric for similarity search.

The Web UI has a built-in console allowing us to run commands directly in the web user interface. We’ll use this to create our collection with the specified vector size and distance function:

 

We can see the created collection by clicking on the Collections icon in the left toolbar:

 

Now we can upload some data. This can be done directly as shown in the tutorial:

 

Another option is to upload a snapshot:

 

Or import a dataset (Qdrant comes out-of-the box with some example datasets):

 

We can browse the stored data by clicking on the specific Collection:

 

And even have a graph view to visualize the relative distance:

 

Last step is to run our first search. We specify a (calculated) vector, the number of vectors we want to get back and if the payload should be returned or not:

 

That’s it! With this information you know enough to start using Qdrant.

More information

An Introduction to Vector Databases – Qdrant

What is Qdrant? – Qdrant

Exploring Microsoft.Extensions.VectorData with Qdrant and Azure AI Search - .NET Blog

Using the Semantic Kernel Qdrant Vector Store connector (Preview) | Microsoft Learn

Popular posts from this blog

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

DevToys–A swiss army knife for developers

As a developer there are a lot of small tasks you need to do as part of your coding, debugging and testing activities.  DevToys is an offline windows app that tries to help you with these tasks. Instead of using different websites you get a fully offline experience offering help for a large list of tasks. Many tools are available. Here is the current list: Converters JSON <> YAML Timestamp Number Base Cron Parser Encoders / Decoders HTML URL Base64 Text & Image GZip JWT Decoder Formatters JSON SQL XML Generators Hash (MD5, SHA1, SHA256, SHA512) UUID 1 and 4 Lorem Ipsum Checksum Text Escape / Unescape Inspector & Case Converter Regex Tester Text Comparer XML Validator Markdown Preview Graphic Col...