Running Local LLMs With Ollama

Getting Started With Ollama

Demo Essential Ollama CLI Commands

In this guide, we’ll walk through the most common Ollama CLI commands with practical examples. Whether you’re spinning up a model server, inspecting metadata, or managing local models, these commands form the foundation of any AI application workflow.

Table of Contents

  1. List All Commands
  2. Run a Model Interactively
  3. Stop a Running Model
  4. View Local Models
  5. Remove a Model
  6. Download Models Without Running
  7. Show Model Details
  8. Monitor Active Models
  9. Links and References

1. List All Commands

To discover every available Ollama subcommand, simply run:

ollama

You’ll see output similar to:

Available Commands:
  serve      Start Ollama model server
  create     Create a model from a Modelfile
  show       Show information for a model
  run        Run a model
  stop       Stop a running model
  pull       Pull a model from a registry
  push       Push a model to a registry
  list       List local models
  ps         List running models
  cp         Copy a model
  rm         Remove a model
  help       Help about any command

Flags:
  -h, --help     Help for ollama
  -v, --version  Show version information

Tip

Append --help to any command to view detailed usage information, for example:

ollama run --help

2. Run a Model Interactively

Launch an interactive chat session with a model (e.g., llama3.2):

ollama run llama3.2

Once running, type your queries:

>>> What is the time?
I'm not sure what time you are referring to, as I'm a large language model without real-time access...

To exit the chat, enter:

>>> /bye

3. Stop a Running Model

Models continue running in the background even after you exit the chat interface. To completely terminate:

ollama stop llama3.2

Warning

Leaving unused models running can consume memory and GPU resources. Always stop models you’re no longer using.


4. View Local Models

List all models downloaded on your machine:

ollama list
NAMEIDSIZEMODIFIED
llava:latest8dd30f6b0cb14.7 GB48 minutes ago
llama3.2:latesta80c4f17acd52.0 GBAbout an hour ago

5. Remove a Model

Free up disk space by deleting a model:

ollama rm llava

Confirm removal:

ollama list
NAMEIDSIZEMODIFIED
llama3.2:latesta80c4f17acd52.0 GBAbout an hour ago

6. Download Models Without Running

Use ollama pull to fetch a model without immediately launching it. For example, to pull Mistral 7B:

The image shows a webpage for the Mistral AI 7B model, detailing its version, parameters, and license information. The page includes options for searching models and accessing tools.

ollama pull mistral

You’ll see progress bars like:

pulling manifest
pulling ff82381e2bea... 28%

Note

Press Ctrl+C at any point to abort the download.


7. Show Model Details

Inspect model metadata, architecture, and licensing:

ollama show llama3.2
Model
  architecture     llama
  parameters       3.2B
  context length   131072
  embedding length 3072
  quantization     Q4_K_M

Parameters
  stop  "<|start_header_id|>"
  stop  "<|end_header_id|>"
  stop  "<|eot_id|>"

License
  LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
  Release Date: March 26, 2024

Understanding the license ensures you comply with usage restrictions.


8. Monitor Active Models

Similar to Docker’s ps, this command lists all currently running models:

ollama ps

After launching a model in one terminal:

ollama run llama3.2

Open another terminal and run:

ollama ps
NAMEIDSIZEPROCESSORUNTIL
llama3.2:latesta80c4f17acd54.0 GB100% GPU4 minutes from now

To stop the model:

ollama stop llama3.2

Verify no active models remain:

ollama ps

Watch Video

Watch video content

Practice Lab

Practice lab

Previous
Essential Ollama CLI Commands