Skip to main content
This guide shows how to set up a minimal Python project that communicates with a locally running Ollama server. You’ll create a virtual environment, install the official Ollama Python client, and run two example scripts:
  • A single-request text generation (one-shot prompt)
  • A context-aware chat loop that preserves conversation history
Environment note: this walkthrough was performed in Visual Studio Code on Windows using WSL. Your Ollama instance should be running locally and have at least one model available:
jeremy@LEGION:/mnt/c/Users/jerem/Projects/ollama-test$ ollama list
NAME           ID             SIZE     MODIFIED
gemma3:latest  a2af66c2b7f    3.3 GB   6 hours ago
If you need general documentation, see the Ollama docs and Python venv docs:

1) Create and activate a virtual environment

From your project directory create a virtual environment with Python 3:
python3 -m venv venv
Activate the virtual environment:
  • WSL / macOS / Linux:
source venv/bin/activate
  • Windows (PowerShell):
.\venv\Scripts\Activate.ps1
Quick reference commands:
PurposeCommand
Create venvpython3 -m venv venv
Activate (WSL/macOS/Linux)source venv/bin/activate
Activate (Windows PowerShell).\venv\Scripts\Activate.ps1
Deactivatedeactivate

2) Install the Ollama Python client

With the venv active, install the client:
pip install ollama
The Ollama Python package is a lightweight HTTP client that sends requests to your locally running Ollama background process. The package does not run models locally; it forwards requests to the Ollama server.

3) Simple generation example (single prompt)

Create a file named main.py and add:
import ollama

# Single generation request
result = ollama.generate(
    model='gemma3:latest',
    prompt='Tell me a funny joke about Python'
)

# The client returns a dictionary. Print the textual response.
print(result['response'])
Run it:
(venv) jeremy@LEGION:/mnt/c/Users/jerem/Projects/ollama-test$ python main.py
Okay, here's a joke about Python:
Why did the Python cross the road?
... To get to the other side... and then try to `import` it!
Why use this pattern? The generate call is ideal for one-shot prompts — you send a single prompt and receive a single response, useful for tasks like summarization, code generation, and short Q&A.

4) Context-aware chat loop

For multi-turn conversations you should accumulate the message history and use the chat API. Save this as chat_main.py (or merge into main.py).
import ollama

messages = []

while True:
    user_input = input("You: ")
    # Exit on '/exit' or an empty line
    if user_input.strip().lower() in ('/exit', ''):
        print("Exiting chat.")
        break

    # Append the user's message
    messages.append({'role': 'user', 'content': user_input})

    # Send the conversation history to the Ollama chat endpoint
    response = ollama.chat(model='gemma3:latest', messages=messages)

    # Extract assistant content and display it
    assistant_content = response['message']['content']
    print("Bot:", assistant_content)

    # Keep the assistant message in the history for context
    messages.append({'role': 'assistant', 'content': assistant_content})
Example usage:
  • Type a prompt when asked (for example: “Tell me a funny joke about Python”).
  • Continue the conversation; the messages list preserves the full exchange so the model can reference earlier turns.
  • Enter /exit or press Enter on an empty line to quit.
Why this structure?
  • The generate example demonstrates single-shot usage.
  • The chat loop shows how to preserve conversational context by appending {'role': 'user'|'assistant', 'content': ...} entries to a messages list and sending that full history each time.

File summary

FilenamePurpose
main.pySingle-shot generation example using ollama.generate()
chat_main.pyMulti-turn chat example using ollama.chat() and a messages list
Make sure your Ollama server is running locally before executing these scripts. The Python client communicates with the background Ollama process via HTTP and will fail if the server is not available. Also replace gemma3:latest with the model name you have installed.
Final reminders
  • Keep your virtual environment activated while installing and running the scripts.
  • Monitor available models with ollama list and update the model= parameter accordingly.
  • Use the chat pattern to maintain conversational state when building bots, assistants, or multi-turn tools.

Watch Video