Skip to main content

Running large language models locally using Ollama

In this post I want to introduce you to Ollama. Ollama is a streamlined tool for running open-source LLMs locally, including Mistral and Llama 2. It bundles model weights, configurations, and datasets into a unified package managed by a Modelfile.

Ollama supports a variety of LLMs including LLaMA-2, uncensored LLaMA, CodeLLaMA, Falcon, Mistral, Vicuna model, WizardCoder, and Wizard uncensored.

Installation

To install Ollama on Windows, first download the executable available here: https://ollama.com/download/OllamaSetup.exe

Run the executable to start the Installation wizard:

Click Install to start the installation process. After the installation has completed Ollama will be running in the background:

We can now open a command prompt and call ollama:

Download a model

Before we can do anything useful, we first need to download a specific language model. The full list of models can be found at https://ollama.com/library.

Here are some examples:

Model Parameters Size Download
Llama 2 7B 3.8GB ollama run llama2
Mistral 7B 4.1GB ollama run mistral
Llama 2 13B 13B 7.3GB ollama run llama2:13b
Llama 2 70B 70B 39GB ollama run llama2:70b

Remark: Make sure you have enough RAM before you try to run one of the larger models.

Let’s give Llama 2 a try. We execute the following command to download and run the language model:

ollama run llama2

Be patient. It can take a while to download the model.

Remark: If you only want to download the model, you can use the pull command:

ollama pull llama2

Invoke the model

We can invoke the model directly from the commandline using the run command as we have seen above:

Ollama also  has an an API endpoint running at the following location: http://localhost:11434.

We can invoke it for example through Postman:

In the example above, I had set stream to false. This  requires us to wait until the LLM has generated a full response. 

Have a look here for the full API documentation: https://github.com/ollama/ollama/blob/main/docs/api.md 

We'll use this API in our next post about .NET Smart Components. Stay tuned!

More information

https://ollama.com/blog/windows-preview

https://ollama.com/

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

VS Code Planning mode

After the introduction of Plan mode in Visual Studio , it now also found its way into VS Code. Planning mode, or as I like to call it 'Hannibal mode', extends GitHub Copilot's Agent Mode capabilities to handle larger, multi-step coding tasks with a structured approach. Instead of jumping straight into code generation, Planning mode creates a detailed execution plan. If you want more details, have a look at my previous post . Putting plan mode into action VS Code takes a different approach compared to Visual Studio when using plan mode. Instead of a configuration setting that you can activate but have limited control over, planning is available as a separate chat mode/agent: I like this approach better than how Visual Studio does it as you have explicit control when plan mode is activated. Instead of immediately diving into execution, the plan agent creates a plan and asks some follow up questions: You can further edit the plan by clicking on ‘Open in Editor’: ...