Part VIII – Evaluations(continued)

I promised to continue my blog post from yesterday about Microsoft.Extensions.AI.Evaluation. Today we have a look at caching the responses and reporting.

This post is part of a blog series. Other posts so far:

Part I – An introduction to Microsoft.Extensions.AI
Part II – ASP.NET Core integration
Part III –Tool calling
Part IV – Telemetry integration
Part V – Chat history
Part VI – Structured output
Part VII - MCP integration
Part VIII – Evaluations
Part VIII – Evaluations (continued)

The example I showed yesterday was a simple example of how you can integrate LLM validation into your tests and check the relevance of the LLM response. However this is only one of the many metrics you typically want to check.

A more realistic test scenario will evaluate a large range of metrics and as tests can be run quite frequently caching the responses of our LLM models will save us both money and time (as tests can run faster).

Let’s update our previous example.

We start with the same ChatConfiguration:

And prompt:

But now we also include a ReportingConfiguration instance.

A ReportingConfiguration identifies:

The set of evaluators that should be invoked
The LLM endpoint that the evaluators should use (our ChatConfiguration)
How and where the results for the scenario runs should be stored.
How LLM responses related to the scenario runs should be cached.
The execution name that should be used when reporting results for the scenario runs.

Let’s put it into practice:

We first need to add an extra NuGet package:

dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting

Now we can create our ReportingConfiguration object:

Inside our test method, we create ScenarioRun instance. This class stores the results of our evaluation in the result store(on disk in this example).

The remaining part of our test is similar to before. An important difference is that we are using the ChatClient instance from the ReportingConfiguration instance. Doing this will enable response caching allowing to use a response from the cache on subsequent runs of our test.

Visualize the evaluation results

The cool thing is that there is a .NET tool available that allows us to generate an HTML report from the evaluation results.

Install the tool using following command:

dotnet tool install --local --create-manifest-if-needed Microsoft.Extensions.AI.Evaluation.Console

You can now read the test results from the configured test directory:

dotnet tool run aieval report --path <path\to\your\cache\storage> --output report.html

Remark: An extension exists for Azure DevOps to visualize the report directly in Azure DevOps.

More information

The Microsoft.Extensions.AI.Evaluation libraries - .NET | Microsoft Learn

ai-samples/src/microsoft-extensions-ai-evaluation/api/README.md at main · dotnet/ai-samples

Azure DevOps AI Evaluation Report - Visual Studio Marketplace

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

.NET 8–Keyed/Named Services

A feature that a lot of IoC container libraries support but that was missing in the default DI container provided by Microsoft is the support for Keyed or Named Services. This feature allows you to register the same type multiple times using different names, allowing you to resolve a specific instance based on the circumstances. Although there is some controversy if supporting this feature is a good idea or not, it certainly can be handy. To support this feature a new interface IKeyedServiceProvider got introduced in .NET 8 providing 2 new methods on our ServiceProvider instance: object? GetKeyedService(Type serviceType, object? serviceKey); object GetRequiredKeyedService(Type serviceType, object? serviceKey); To use it, we need to register our service using one of the new extension methods: Resolving the service can be done either through the FromKeyedServices attribute: or by injecting the IKeyedServiceProvider interface and calling the GetRequiredKeyedServic...

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

The art of simplicity

Search This Blog