How to use Ollama?

Leon Chase

17 Feb 2025 • 4 min read

Ollama is a tool designed to simplify the process of running and interacting with large language models (LLMs) locally on your machine. It allows you to download, manage, and run various open-source models (like Llama, Mistral, and others) without needing extensive knowledge of model deployment or infrastructure setup. Ollama is particularly useful for developers who want to experiment with LLMs, build applications around them, or integrate them into workflows without relying on cloud-based APIs.

Here’s a step-by-step guide on how to use Ollama:

1. Installation

Step 1: Download and Install Ollama

Download: Visit the Ollama website and download the appropriate version for your operating system (Windows, macOS, or Linux).
Install:
- macOS: You can install Ollama using Homebrew:
```
brew install ollama
```
- Linux: Follow the installation instructions provided on the website, which typically involve downloading a .deb or .rpm package and installing it via the terminal.
- Windows: Download the installer and follow the on-screen instructions.

Step 2: Verify Installation

Once installed, you can verify that Ollama is working by running:

ollama --version

This should display the version of Ollama installed.

2. Downloading a Model

Ollama supports several open-source models like Llama, Mistral, Gemma, and more. You can download and run these models locally.

Step 3: List Available Models

To see the list of available models, run:

ollama list

Step 4: Pull a Model

To download a specific model, use the pull command. For example, to download the Llama 2 model:

ollama pull llama2

You can replace llama2 with any other model name (e.g., mistral, gemma, etc.).

3. Running a Model

Once you’ve downloaded a model, you can start interacting with it.

Step 5: Start a Chat Session

To start a chat session with the model, use the run command. For example, to interact with the Llama 2 model:

ollama run llama2

This will open an interactive chat session where you can type prompts and receive responses from the model.

Example Interaction:

$ ollama run llama2
>>> Hello!
Hello! How can I assist you today?
>>> What is the capital of France?
The capital of France is Paris.

You can exit the chat session by pressing Ctrl+C.

4. Customizing Prompts

Ollama allows you to customize the behavior of the model by providing custom prompts or instructions.

Step 6: Provide System Prompts

You can provide a system prompt to guide the model's behavior. For example, if you want the model to act as a code assistant:

ollama run llama2 --system "You are a helpful code assistant."

Now, the model will respond as if it’s a code assistant, providing programming-related answers.

Step 7: Pass Custom Prompts

You can also pass custom prompts directly from the command line:

ollama run llama2 "Explain the concept of recursion in programming."

This will return the model's response to the given prompt.

5. Managing Models

Step 8: List Installed Models

To see which models are currently installed on your system, use:

ollama list

This will display a list of models along with their sizes and statuses.

Step 9: Remove a Model

If you no longer need a model, you can remove it using the rm command:

ollama rm llama2

This will delete the Llama 2 model from your system.

6. Advanced Usage

Step 10: Run Models Programmatically

Ollama provides an API that allows you to interact with models programmatically. You can use this API to integrate Ollama into your applications or scripts.

Example: Using Python to Interact with Ollama

You can use the requests library to send HTTP requests to the Ollama API. First, ensure that Ollama is running as a server:

ollama serve

Then, you can send a request from Python:

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    "model": "llama2",
    "prompt": "What is the capital of France?"
})

print(response.json())

Step 11: Fine-Tuning Models

While Ollama doesn’t natively support fine-tuning models, you can use external tools to fine-tune models and then load them into Ollama for inference.

7. Use Cases for Ollama

Local Development and Testing

Experimentation: Developers can experiment with different models locally without relying on cloud-based APIs, which can be costly or have rate limits.
Prototyping: Quickly prototype applications that require natural language processing (NLP) capabilities, such as chatbots, question-answering systems, or content generation tools.

Privacy-Sensitive Applications

Data Privacy: Since Ollama runs models locally, it’s ideal for applications where data privacy is critical. No data leaves your machine, making it suitable for sensitive use cases like legal or healthcare applications.

Offline Use

No Internet Required: Ollama allows you to run models offline, which is useful in environments where internet access is limited or unavailable.

Custom AI Workflows

Custom Prompts: You can create custom workflows by chaining multiple prompts or integrating Ollama with other tools like LangChain or AutoGPT to build more complex AI-driven applications.

8. Pros and Cons of Using Ollama

Pros:

Ease of Use: Ollama simplifies the process of downloading, managing, and running large language models locally.
Privacy: Since the models run locally, your data never leaves your machine, ensuring privacy.
Cost-Effective: Running models locally can be more cost-effective than using cloud-based APIs, especially for high-volume tasks.
Open Source: Ollama supports a wide range of open-source models, giving you flexibility in choosing the right model for your needs.

Cons:

Hardware Requirements: Running large language models locally requires significant computational resources (e.g., GPU/CPU power and memory). Some models may not run efficiently on low-end machines.
Limited Fine-Tuning: Ollama doesn’t natively support fine-tuning models, so you’ll need to use external tools if you want to customize models further.
Model Size: Some models can be very large (several GBs), which may limit the number of models you can store and run on your machine.

Conclusion

Ollama is a powerful and user-friendly tool for running large language models locally. It abstracts away much of the complexity involved in setting up and managing models, making it accessible to developers and researchers who want to experiment with LLMs without relying on cloud services.

Whether you’re building a chatbot, automating tasks, or experimenting with NLP, Ollama provides a simple way to interact with state-of-the-art models on your own hardware. However, keep in mind that running these models locally requires sufficient computational resources, and some advanced features like fine-tuning may require additional tools.

If you’re looking for a lightweight, privacy-focused solution for running LLMs, Ollama is an excellent choice.