Running large language models locally using Ollama

In this post I want to introduce you to Ollama. Ollama is a streamlined tool for running open-source LLMs locally, including Mistral and Llama 2. It bundles model weights, configurations, and datasets into a unified package managed by a Modelfile.

Ollama supports a variety of LLMs including LLaMA-2, uncensored LLaMA, CodeLLaMA, Falcon, Mistral, Vicuna model, WizardCoder, and Wizard uncensored.

Installation

To install Ollama on Windows, first download the executable available here: https://ollama.com/download/OllamaSetup.exe

Run the executable to start the Installation wizard:

Click Install to start the installation process. After the installation has completed Ollama will be running in the background:

We can now open a command prompt and call ollama:

Download a model

Before we can do anything useful, we first need to download a specific language model. The full list of models can be found at https://ollama.com/library.

Here are some examples:

Model	Parameters	Size	Download
Llama 2	7B	3.8GB	ollama run llama2
Mistral	7B	4.1GB	ollama run mistral
Llama 2 13B	13B	7.3GB	ollama run llama2:13b
Llama 2 70B	70B	39GB	ollama run llama2:70b

Remark: Make sure you have enough RAM before you try to run one of the larger models.

Let’s give Llama 2 a try. We execute the following command to download and run the language model:

ollama run llama2

Be patient. It can take a while to download the model.

Remark: If you only want to download the model, you can use the pull command:

ollama pull llama2

Invoke the model

We can invoke the model directly from the commandline using the run command as we have seen above:

Ollama also has an an API endpoint running at the following location: http://localhost:11434.

We can invoke it for example through Postman:

In the example above, I had set stream to false. This requires us to wait until the LLM has generated a full response.

Have a look here for the full API documentation: https://github.com/ollama/ollama/blob/main/docs/api.md

We'll use this API in our next post about .NET Smart Components. Stay tuned!

More information

https://ollama.com/blog/windows-preview

https://ollama.com/

The art of simplicity

Search This Blog

Running large language models locally using Ollama

Installation

Download a model

Invoke the model

More information

Labels

Popular posts from this blog

.NET 8–Keyed/Named Services

Azure DevOps/ GitHub emoji

Kubernetes–Limit your environmental impact