I promised to continue my blog post from yesterday about Microsoft.Extensions.AI.Evaluation. Today we have a look at caching the responses and reporting.
This post is part of a blog series. Other posts so far:
- Part I – An introduction to Microsoft.Extensions.AI
- Part II – ASP.NET Core integration
- Part III –Tool calling
- Part IV – Telemetry integration
- Part V – Chat history
- Part VI – Structured output
- Part VII - MCP integration
- Part VIII – Evaluations
- Part VIII – Evaluations (continued)
The example I showed yesterday was a simple example of how you can integrate LLM validation into your tests and check the relevance of the LLM response. However this is only one of the many metrics you typically want to check.
A more realistic test scenario will evaluate a large range of metrics and as tests can be run quite frequently caching the responses of our LLM models will save us both money and time (as tests can run faster).
Let’s update our previous example.
We start with the same ChatConfiguration
:
And prompt:
But now we also include a ReportingConfiguration
instance.
A ReportingConfiguration identifies:
- The set of evaluators that should be invoked
- The LLM endpoint that the evaluators should use (our
ChatConfiguration
) - How and where the results for the scenario runs should be stored.
- How LLM responses related to the scenario runs should be cached.
- The execution name that should be used when reporting results for the scenario runs.
Let’s put it into practice:
We first need to add an extra NuGet package:
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting
Now we can create our ReportingConfiguration
object:
Inside our test method, we create ScenarioRun
instance. This class stores the results of our evaluation in the result store(on disk in this example).
The remaining part of our test is similar to before. An important difference is that we are using the ChatClient instance from the ReportingConfiguration
instance. Doing this will enable response caching allowing to use a response from the cache on subsequent runs of our test.
Visualize the evaluation results
The cool thing is that there is a .NET tool available that allows us to generate an HTML report from the evaluation results.
Install the tool using following command:
dotnet tool install --local --create-manifest-if-needed Microsoft.Extensions.AI.Evaluation.Console
You can now read the test results from the configured test directory:
dotnet tool run aieval report --path <path\to\your\cache\storage> --output report.html
Remark: An extension exists for Azure DevOps to visualize the report directly in Azure DevOps.
More information
The Microsoft.Extensions.AI.Evaluation libraries - .NET | Microsoft Learn
ai-samples/src/microsoft-extensions-ai-evaluation/api/README.md at main · dotnet/ai-samples
Azure DevOps AI Evaluation Report - Visual Studio Marketplace