Skip to main content

Running large language models locally using Ollama

In this post I want to introduce you to Ollama. Ollama is a streamlined tool for running open-source LLMs locally, including Mistral and Llama 2. It bundles model weights, configurations, and datasets into a unified package managed by a Modelfile.

Ollama supports a variety of LLMs including LLaMA-2, uncensored LLaMA, CodeLLaMA, Falcon, Mistral, Vicuna model, WizardCoder, and Wizard uncensored.

Installation

To install Ollama on Windows, first download the executable available here: https://ollama.com/download/OllamaSetup.exe

Run the executable to start the Installation wizard:

Click Install to start the installation process. After the installation has completed Ollama will be running in the background:

We can now open a command prompt and call ollama:

Download a model

Before we can do anything useful, we first need to download a specific language model. The full list of models can be found at https://ollama.com/library.

Here are some examples:

Model Parameters Size Download
Llama 2 7B 3.8GB ollama run llama2
Mistral 7B 4.1GB ollama run mistral
Llama 2 13B 13B 7.3GB ollama run llama2:13b
Llama 2 70B 70B 39GB ollama run llama2:70b

Remark: Make sure you have enough RAM before you try to run one of the larger models.

Let’s give Llama 2 a try. We execute the following command to download and run the language model:

ollama run llama2

Be patient. It can take a while to download the model.

Remark: If you only want to download the model, you can use the pull command:

ollama pull llama2

Invoke the model

We can invoke the model directly from the commandline using the run command as we have seen above:

Ollama also  has an an API endpoint running at the following location: http://localhost:11434.

We can invoke it for example through Postman:

In the example above, I had set stream to false. This  requires us to wait until the LLM has generated a full response. 

Have a look here for the full API documentation: https://github.com/ollama/ollama/blob/main/docs/api.md 

We'll use this API in our next post about .NET Smart Components. Stay tuned!

More information

https://ollama.com/blog/windows-preview

https://ollama.com/

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

Cleaner switch expressions with pattern matching in C#

Ever find yourself mapping multiple string values to the same result? Being a C# developer for a long time, I sometimes forget that the C# has evolved so I still dare to chain case labels or reach for a dictionary. Of course with pattern matching this is no longer necessary. With pattern matching, you can express things inline, declaratively, and with zero repetition. A small example I was working on a small script that should invoke different actions depending on the environment. As our developers were using different variations for the same environment e.g.  "tst" alongside "test" , "prd" alongside "prod" .  We asked to streamline this a long time ago, but as these things happen, we still see variations in the wild. This brought me to the following code that is a perfect example for pattern matching: The or keyword here is a logical pattern combinator , not a boolean operator. It matches if either of the specified pattern...