As Semantic Kernel could work with any OpenAI compatible endpoint, and Ollama exposes it language models through an OpenAI compatible API, combining the 2 was always possible. However not all features of Ollama were accessible through Semantic Kernel.
With the recent release of a dedicated Ollama connector for Semantic Kernel, we can start using some of the more advanced Semantic Kernel features directly targetting Ollama deployed models.
The new connector is using Ollama Sharp(I talked about it in this post) so you can directly access the library if needed.
Giving the new connector a try…
- Create a new Console application and add the Microsoft.SemanticKernel.Connectors.Ollama NuGet package:
dotnet add package Microsoft.SemanticKernel.Connectors.Ollama --version 1.21.1-alpha
- Now instead of creating a Semantic Kernel instance, we can directly create an
OllamaChatCompletionService
instance:
var chatCompletionService = new OllamaChatCompletionService( | |
endpoint: new Uri("http://localhost:11434"), | |
modelId: "phi3.5:latest"); |
- The remaining part of the code remains the same as with the default Semantic Kernel ChatCompletionService:
var chatMessages = new ChatHistory("You are a travel agent. You like to give adventurous travel advice."); | |
// Start the conversation | |
while (true) | |
{ | |
// Get user input | |
System.Console.Write("User > "); | |
chatMessages.AddUserMessage(Console.ReadLine()!); | |
// Get the chat completions | |
var result = chatCompletionService.GetStreamingChatMessageContentsAsync( | |
chatMessages); | |
// Stream the results | |
string fullMessage = ""; | |
bool roleWritten = false; | |
await foreach (var content in result) | |
{ | |
if (content.Role.HasValue && !roleWritten) | |
{ | |
System.Console.Write("Assistant > "); | |
roleWritten = true; | |
} | |
System.Console.Write(content.Content); | |
fullMessage += content.Content; | |
} | |
System.Console.WriteLine(); | |
// Add the message from the agent to the chat history | |
chatMessages.AddAssistantMessage(fullMessage); | |
} |
- What now is different is that we can access the underlying OllamaSharp objects if we want to:
await foreach (var content in result) | |
{ | |
if (content.Role.HasValue && !roleWritten) | |
{ | |
System.Console.Write("Assistant > "); | |
roleWritten = true; | |
} | |
System.Console.Write(content.Content); | |
fullMessage += content.Content; | |
//Cast the InnerContent to an OllamaSharp ChatResponseStream object | |
var innerContent = content.InnerContent as ChatResponseStream; | |
OutputInnerContent(innerContent!); | |
} |
void OutputInnerContent(ChatResponseStream streamChunk) | |
{ | |
Console.WriteLine($"Model: {streamChunk.Model}"); | |
Console.WriteLine($"Message role: {streamChunk.Message.Role}"); | |
Console.WriteLine($"Message content: {streamChunk.Message.Content}"); | |
Console.WriteLine($"Created at: {streamChunk.CreatedAt}"); | |
Console.WriteLine($"Done: {streamChunk.Done}"); | |
/// The last message in the chunk is a <see cref="ChatDoneResponseStream"/> type with additional metadata. | |
if (streamChunk is ChatDoneResponseStream doneStream) | |
{ | |
Console.WriteLine($"Done Reason: {doneStream.DoneReason}"); | |
Console.WriteLine($"Eval count: {doneStream.EvalCount}"); | |
Console.WriteLine($"Eval duration: {doneStream.EvalDuration}"); | |
Console.WriteLine($"Load duration: {doneStream.LoadDuration}"); | |
Console.WriteLine($"Total duration: {doneStream.TotalDuration}"); | |
Console.WriteLine($"Prompt eval count: {doneStream.PromptEvalCount}"); | |
Console.WriteLine($"Prompt eval duration: {doneStream.PromptEvalDuration}"); | |
} | |
Console.WriteLine("------------------------"); | |
} |
Nice!
More information
Interact with Ollama through C# (bartwullems.blogspot.com)
awaescher/OllamaSharp: The easiest way to use the Ollama API in .NET (github.com)
Introducing new Ollama Connector for Local Models | Semantic Kernel (microsoft.com)