Skip to main content
This guide demonstrates how to structure a small web application that uses the OpenAI Python client to talk to a local Ollama backend (or switch later to OpenAI’s hosted API with minimal changes). We’ll build a simple Flask-based AI Poem Generator that sends user prompts to a chat completion endpoint and renders the model’s output. Using a client library (instead of curl) helps keep your app code clean and portable across local development and hosted providers.
A slide titled "What to expect" showing bidirectional arrows between an Ollama icon and the OpenAI logo, paired with a Python logo. Two checked items on the right read "The way of Programming" and "Structuring your code."

What you’ll need

  • Python 3.8+ installed
  • Ollama running locally for local development (optional if you target OpenAI hosted APIs)
  • Familiarity with virtual environments and Flask
Useful references:

Project setup

  1. Create a project folder, for example:
mkdir -p ~/code/ollama-app
cd ~/code/ollama-app
  1. Create and activate a virtual environment:
python -m venv ollama-app
source ollama-app/bin/activate
  1. Install dependencies:
pip install openai flask python-dotenv
  1. Open your editor and create server.py at the project root.
A screenshot of Visual Studio Code's welcome page showing the "Visual Studio Code — Editing evolved" header with Start options (New File, Open, Clone Git Repository) and a Recent files list. The left sidebar shows an Explorer with a folder named "OLLAMA-APP" and the right side has a "Get Started with VS Code" walkthrough card.

server.py — full implementation

This example exposes a single route (”/”): GET renders a small form, POST sends the prompt to the chat completions endpoint using the OpenAI Python client and displays the returned poem.
# server.py
import os
from flask import Flask, request, render_template_string
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env

app = Flask(__name__)

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url=os.environ.get("LLM_ENDPOINT")
)

# Simple HTML template for the app UI
HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title>AI Poem Generator</title>
    <style>
        body { font-family: Arial, sans-serif; padding: 2rem; background: #f9f9f9; }
        .container { max-width: 800px; margin: 0 auto; background: #fff; padding: 1.5rem; border-radius: 8px; box-shadow: 0 2px 8px rgba(0,0,0,0.05); }
        textarea { width: 100%; height: 120px; padding: 0.5rem; margin-bottom: 0.75rem; }
        button { padding: 0.5rem 1rem; }
        pre { background-color: #f4f4f4; padding: 10px; border-radius: 5px; overflow-x: auto; }
    </style>
</head>
<body>
    <div class="container">
        <h1>AI Poem Generator</h1>
        <form method="POST">
            <textarea name="input" placeholder="Enter your prompt here..."></textarea>
            <br />
            <button type="submit">Generate Poem</button>
        </form>
        {% if poem %}
        <h2>Your AI-Generated Poem</h2>
        <pre>{{ poem }}</pre>
        {% endif %}
    </div>
</body>
</html>
"""

@app.route("/", methods=["GET", "POST"])
def index():
    poem = None
    if request.method == "POST":
        try:
            input_message = request.form["input"]

            response = client.chat.completions.create(
                model=os.environ.get("MODEL"),
                messages=[
                    {"role": "system", "content": "You are an AI assistant specialized in writing poems."},
                    {"role": "user", "content": input_message}
                ],
            )

            # Extract the generated text from the response
            # Structure: response.choices[0].message.content
            poem = response.choices[0].message.content
        except Exception as e:
            # log the exception, and show a friendly error to the user
            print("Error:", str(e))
            poem = "An error occurred when trying to fetch your poem."

    return render_template_string(HTML_TEMPLATE, poem=poem)

if __name__ == "__main__":
    port = int(os.getenv("PORT", 3000))
    # Development server — do not use this directly in production
    app.run(host="0.0.0.0", port=port)

Configuration — environment variables

Store runtime configuration in a .env file at the project root. This keeps credentials and endpoints out of your code and makes switching between local Ollama and OpenAI hosted APIs straightforward.
VariablePurposeExample
OPENAI_API_KEYAPI key used by the OpenAI client. For local Ollama testing this can be any value. Replace with a real key when using OpenAI hosted APIs.kodekloud
LLM_ENDPOINTBase URL for the LLM endpoint. For Ollama local server use http://localhost:11434/v1http://localhost:11434/v1
MODELModel identifier to call (configurable without code changes).llama3.2
Example .env contents:
OPENAI_API_KEY=kodekloud
LLM_ENDPOINT=http://localhost:11434/v1
MODEL=llama3.2

How the app works (high-level)

  • The HTML form posts the user’s prompt to ”/”.
  • The Flask route builds a messages array: a system message to define role plus the user’s message.
  • The OpenAI Python client sends that to the chat completions endpoint (client.chat.completions.create(...)) and returns a response object.
  • We extract the model output at response.choices[0].message.content and render it inside the page.

Start Ollama’s REST API (local)

Ensure the Ollama local service is running so the OpenAI client can reach it. Typical local start command:
ollama serve
By default Ollama exposes endpoints such as POST /v1/chat/completions and listens on localhost:11434. Representative Ollama server log when started:
output
time=2025-01-24T13:05:44.483+05:30 level=INFO msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-01-24T13:05:44.530+05:30 level=INFO msg="inference compute" id=0 library=metal total="10.7 GiB" available="10.7 GiB"

Run the Flask app

With your virtualenv active and .env in place:
python server.py
You should see the Flask development server start:
* Serving Flask app 'server'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:3000
Press CTRL+C to quit
Open http://127.0.0.1:3000, enter a prompt such as “Write a short poem about cats” and click “Generate Poem.” The app will POST the prompt to the model and display the returned poem.
A browser screenshot of a web app titled "AI Poem Generator" with a prompt input box, a "Generate Poem" button, and a panel labeled "Your AI-Generated Poem." The displayed poem is a block of text about felines.

Logs you will see

  • Flask logs GET and POST requests to /.
  • Ollama logs incoming requests to /v1/chat/completions.
Example Flask access log:
127.0.0.1 - - [24/Jan/2025 13:07:03] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [24/Jan/2025 13:07:43] "POST / HTTP/1.1" 200 -

Next steps and production considerations

  • This demo illustrates a minimal integration pattern. To prepare for production:
    • Use a production WSGI server (gunicorn/uvicorn) instead of Flask’s dev server.
    • Store secrets securely (e.g., a secret manager or environment variable service).
    • Add robust error handling, rate limiting, and input validation/sanitization.
    • Cache or paginate long-running requests and handle model timeouts gracefully.
    • When switching to OpenAI hosted APIs, change the LLM_ENDPOINT and set a valid OPENAI_API_KEY.
This is a simple demo intended for local development. For production, use a production WSGI server (gunicorn/uvicorn), secure environment secrets properly, and follow best practices for rate-limiting, error handling, and user input sanitization.

Watch Video

Practice Lab