Demo Creating an App Using Ollama OpenAI Python Client

This guide demonstrates how to structure a small web application that uses the OpenAI Python client to talk to a local Ollama backend (or switch later to OpenAI’s hosted API with minimal changes). We’ll build a simple Flask-based AI Poem Generator that sends user prompts to a chat completion endpoint and renders the model’s output. Using a client library (instead of curl) helps keep your app code clean and portable across local development and hosted providers.

What you’ll need

Python 3.8+ installed
Ollama running locally for local development (optional if you target OpenAI hosted APIs)
Familiarity with virtual environments and Flask

Useful references:

OpenAI Python client docs
Ollama docs (for running a local model server)
Flask documentation

Project setup

Create a project folder, for example:

mkdir -p ~/code/ollama-app
cd ~/code/ollama-app

Create and activate a virtual environment:

python -m venv ollama-app
source ollama-app/bin/activate

Install dependencies:

pip install openai flask python-dotenv

Open your editor and create server.py at the project root.

A screenshot of Visual Studio Code's welcome page showing the "Visual Studio Code — Editing evolved" header with Start options (New File, Open, Clone Git Repository) and a Recent files list. The left sidebar shows an Explorer with a folder named "OLLAMA-APP" and the right side has a "Get Started with VS Code" walkthrough card.

server.py — full implementation

This example exposes a single route (”/”): GET renders a small form, POST sends the prompt to the chat completions endpoint using the OpenAI Python client and displays the returned poem.

# server.py
import os
from flask import Flask, request, render_template_string
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env

app = Flask(__name__)

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url=os.environ.get("LLM_ENDPOINT")
)

# Simple HTML template for the app UI
HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title>AI Poem Generator</title>
    <style>
        body { font-family: Arial, sans-serif; padding: 2rem; background: #f9f9f9; }
        .container { max-width: 800px; margin: 0 auto; background: #fff; padding: 1.5rem; border-radius: 8px; box-shadow: 0 2px 8px rgba(0,0,0,0.05); }
        textarea { width: 100%; height: 120px; padding: 0.5rem; margin-bottom: 0.75rem; }
        button { padding: 0.5rem 1rem; }
        pre { background-color: #f4f4f4; padding: 10px; border-radius: 5px; overflow-x: auto; }
    </style>
</head>
<body>
    <div class="container">
        <h1>AI Poem Generator</h1>
        <form method="POST">
            <textarea name="input" placeholder="Enter your prompt here..."></textarea>
            <br />
            <button type="submit">Generate Poem</button>
        </form>
        {% if poem %}
        <h2>Your AI-Generated Poem</h2>
        <pre>{{ poem }}</pre>
        {% endif %}
    </div>
</body>
</html>
"""

@app.route("/", methods=["GET", "POST"])
def index():
    poem = None
    if request.method == "POST":
        try:
            input_message = request.form["input"]

            response = client.chat.completions.create(
                model=os.environ.get("MODEL"),
                messages=[
                    {"role": "system", "content": "You are an AI assistant specialized in writing poems."},
                    {"role": "user", "content": input_message}
                ],
            )

            # Extract the generated text from the response
            # Structure: response.choices[0].message.content
            poem = response.choices[0].message.content
        except Exception as e:
            # log the exception, and show a friendly error to the user
            print("Error:", str(e))
            poem = "An error occurred when trying to fetch your poem."

    return render_template_string(HTML_TEMPLATE, poem=poem)

if __name__ == "__main__":
    port = int(os.getenv("PORT", 3000))
    # Development server — do not use this directly in production
    app.run(host="0.0.0.0", port=port)

Configuration — environment variables

Store runtime configuration in a .env file at the project root. This keeps credentials and endpoints out of your code and makes switching between local Ollama and OpenAI hosted APIs straightforward.

Variable	Purpose	Example
OPENAI_API_KEY	API key used by the OpenAI client. For local Ollama testing this can be any value. Replace with a real key when using OpenAI hosted APIs.	kodekloud
LLM_ENDPOINT	Base URL for the LLM endpoint. For Ollama local server use http://localhost:11434/v1	http://localhost:11434/v1
MODEL	Model identifier to call (configurable without code changes).	llama3.2

Example .env contents:

OPENAI_API_KEY=kodekloud
LLM_ENDPOINT=http://localhost:11434/v1
MODEL=llama3.2

How the app works (high-level)

The HTML form posts the user’s prompt to ”/”.
The Flask route builds a messages array: a system message to define role plus the user’s message.
The OpenAI Python client sends that to the chat completions endpoint (client.chat.completions.create(...)) and returns a response object.
We extract the model output at response.choices[0].message.content and render it inside the page.

Start Ollama’s REST API (local)

Ensure the Ollama local service is running so the OpenAI client can reach it. Typical local start command:

ollama serve

By default Ollama exposes endpoints such as POST /v1/chat/completions and listens on localhost:11434. Representative Ollama server log when started:

output

time=2025-01-24T13:05:44.483+05:30 level=INFO msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-01-24T13:05:44.530+05:30 level=INFO msg="inference compute" id=0 library=metal total="10.7 GiB" available="10.7 GiB"

Run the Flask app

With your virtualenv active and .env in place:

python server.py

You should see the Flask development server start:

* Serving Flask app 'server'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:3000
Press CTRL+C to quit

Open http://127.0.0.1:3000, enter a prompt such as “Write a short poem about cats” and click “Generate Poem.” The app will POST the prompt to the model and display the returned poem.

A browser screenshot of a web app titled "AI Poem Generator" with a prompt input box, a "Generate Poem" button, and a panel labeled "Your AI-Generated Poem." The displayed poem is a block of text about felines.

Logs you will see

Flask logs GET and POST requests to /.
Ollama logs incoming requests to /v1/chat/completions.

Example Flask access log:

127.0.0.1 - - [24/Jan/2025 13:07:03] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [24/Jan/2025 13:07:43] "POST / HTTP/1.1" 200 -

Next steps and production considerations

This demo illustrates a minimal integration pattern. To prepare for production:
- Use a production WSGI server (gunicorn/uvicorn) instead of Flask’s dev server.
- Store secrets securely (e.g., a secret manager or environment variable service).
- Add robust error handling, rate limiting, and input validation/sanitization.
- Cache or paginate long-running requests and handle model timeouts gracefully.
- When switching to OpenAI hosted APIs, change the LLM_ENDPOINT and set a valid OPENAI_API_KEY.

This is a simple demo intended for local development. For production, use a production WSGI server (gunicorn/uvicorn), secure environment secrets properly, and follow best practices for rate-limiting, error handling, and user input sanitization.

Getting Started With Ollama

Building AI Applications

Customising Models With Ollama

Introduction

Prerequisites

Demo Creating an App Using Ollama OpenAI Python Client

What you’ll need

Project setup

server.py — full implementation

Configuration — environment variables

How the app works (high-level)

Start Ollama’s REST API (local)

Run the Flask app

Logs you will see

Next steps and production considerations

Watch Video

Practice Lab

Getting Started With Ollama

Building AI Applications

Customising Models With Ollama

Introduction

Prerequisites

Documentation Index

​What you’ll need

​Project setup

​server.py — full implementation

​Configuration — environment variables

​How the app works (high-level)

​Start Ollama’s REST API (local)

​Run the Flask app

​Logs you will see

​Next steps and production considerations

Watch Video

Practice Lab

What you’ll need

Project setup

server.py — full implementation

Configuration — environment variables

How the app works (high-level)

Start Ollama’s REST API (local)

Run the Flask app

Logs you will see

Next steps and production considerations