Back from holiday with charged batteries, we continue our journey exploring the Microsoft.Extensions.AI
library. Today we have a look at evaluating AI models.
This post is part of a blog series. Other posts so far:
- Part I – An introduction to Microsoft.Extensions.AI
- Part II – ASP.NET Core integration
- Part III –Tool calling
- Part IV – Telemetry integration
- Part V – Chat history
- Part VI – Structured output
- Part VII - MCP integration
- Part VIII – Evaluations
What is Microsoft.Extensions.AI.Evaluation?
Microsoft.Extensions.AI.Evaluation is a set of libraries with one common goal; simplifying the process of evaluating the quality and accuracy of responses generated by AI models. Measuring the quality of your AI apps is challenging, you need to evaluate metrics like:
- Relevance: How effective is the response for a given prompt?
- Truthfulness: Is the response factually correct?
- Coherence: Is the response logically structured and consistent?
- Completeness: Is the response a sufficient answer?
- And many more…
The evaluation libraries handle this for you through a list of available evaluators that can easily be integrated in your existing test infrastructure and framework.
But enough talking, let’s give it a try…
Integrate AI quality validation for our chat application
Start by adding a new test project to your solution using the framework of your choice. I'll be using XUnit in this post but the library itself is completely test framework agnostic.
Add a reference to the Microsoft.Extensions.AI.Evaluation.Quality
library:
dotnet package add Microsoft.Extensions.AI.Evaluation.Quality
Now we first need to bootstrap our ChatConfiguration
:
And also build up our prompt:
Once these 2 things are in place, we can setup the evaluator(s) for our test, invoke our LLM and evaluate the results:
Of course, this test fails as the AI suggested the moon as a good holiday location.
Tomorrow we further extend this example and have a look at caching of the model responses and reporting of the evaluations.
More information
The Microsoft.Extensions.AI.Evaluation libraries - .NET | Microsoft Learn
ai-samples/src/microsoft-extensions-ai-evaluation/api/README.md at main · dotnet/ai-samples
Exploring new Agent Quality and NLP evaluators for .NET AI applications - .NET Blog