Demo Project Structure and Development Environment

This guide shows how to set up a minimal Python project that communicates with a locally running Ollama server. You’ll create a virtual environment, install the official Ollama Python client, and run two example scripts:

A single-request text generation (one-shot prompt)
A context-aware chat loop that preserves conversation history

Environment note: this walkthrough was performed in Visual Studio Code on Windows using WSL. Your Ollama instance should be running locally and have at least one model available:

jeremy@LEGION:/mnt/c/Users/jerem/Projects/ollama-test$ ollama list
NAME           ID             SIZE     MODIFIED
gemma3:latest  a2af66c2b7f    3.3 GB   6 hours ago

If you need general documentation, see the Ollama docs and Python venv docs:

1) Create and activate a virtual environment

From your project directory create a virtual environment with Python 3:

python3 -m venv venv

Activate the virtual environment:

WSL / macOS / Linux:

source venv/bin/activate

Windows (PowerShell):

.\venv\Scripts\Activate.ps1

Quick reference commands:

Purpose	Command
Create venv	`python3 -m venv venv`
Activate (WSL/macOS/Linux)	`source venv/bin/activate`
Activate (Windows PowerShell)	`.\venv\Scripts\Activate.ps1`
Deactivate	`deactivate`

2) Install the Ollama Python client

With the venv active, install the client:

pip install ollama

The Ollama Python package is a lightweight HTTP client that sends requests to your locally running Ollama background process. The package does not run models locally; it forwards requests to the Ollama server.

3) Simple generation example (single prompt)

Create a file named main.py and add:

import ollama

# Single generation request
result = ollama.generate(
    model='gemma3:latest',
    prompt='Tell me a funny joke about Python'
)

# The client returns a dictionary. Print the textual response.
print(result['response'])

Run it:

(venv) jeremy@LEGION:/mnt/c/Users/jerem/Projects/ollama-test$ python main.py
Okay, here's a joke about Python:
Why did the Python cross the road?
... To get to the other side... and then try to `import` it!

Why use this pattern? The generate call is ideal for one-shot prompts — you send a single prompt and receive a single response, useful for tasks like summarization, code generation, and short Q&A.

4) Context-aware chat loop

For multi-turn conversations you should accumulate the message history and use the chat API. Save this as chat_main.py (or merge into main.py).

import ollama

messages = []

while True:
    user_input = input("You: ")
    # Exit on '/exit' or an empty line
    if user_input.strip().lower() in ('/exit', ''):
        print("Exiting chat.")
        break

    # Append the user's message
    messages.append({'role': 'user', 'content': user_input})

    # Send the conversation history to the Ollama chat endpoint
    response = ollama.chat(model='gemma3:latest', messages=messages)

    # Extract assistant content and display it
    assistant_content = response['message']['content']
    print("Bot:", assistant_content)

    # Keep the assistant message in the history for context
    messages.append({'role': 'assistant', 'content': assistant_content})

Example usage:

Type a prompt when asked (for example: “Tell me a funny joke about Python”).
Continue the conversation; the messages list preserves the full exchange so the model can reference earlier turns.
Enter /exit or press Enter on an empty line to quit.

Why this structure?

The generate example demonstrates single-shot usage.
The chat loop shows how to preserve conversational context by appending {'role': 'user'|'assistant', 'content': ...} entries to a messages list and sending that full history each time.

File summary

Filename	Purpose
`main.py`	Single-shot generation example using `ollama.generate()`
`chat_main.py`	Multi-turn chat example using `ollama.chat()` and a `messages` list

Make sure your Ollama server is running locally before executing these scripts. The Python client communicates with the background Ollama process via HTTP and will fail if the server is not available. Also replace gemma3:latest with the model name you have installed.

Final reminders

Keep your virtual environment activated while installing and running the scripts.
Monitor available models with ollama list and update the model= parameter accordingly.
Use the chat pattern to maintain conversational state when building bots, assistants, or multi-turn tools.

Introduction

Document Processing and Chunking

Keyword Search & Retrieval

Semantic Search & Embeddings

Vector Databases

Building the RAG Pipeline

Demo Project Structure and Development Environment

1) Create and activate a virtual environment

2) Install the Ollama Python client

3) Simple generation example (single prompt)

4) Context-aware chat loop

File summary

Watch Video

​1) Create and activate a virtual environment

​2) Install the Ollama Python client

​3) Simple generation example (single prompt)

​4) Context-aware chat loop

​File summary

Watch Video

1) Create and activate a virtual environment

2) Install the Ollama Python client

3) Simple generation example (single prompt)

4) Context-aware chat loop

File summary