This guide demonstrates how to structure a small web application that uses the OpenAI Python client to talk to a local Ollama backend (or switch later to OpenAI’s hosted API with minimal changes). We’ll build a simple Flask-based AI Poem Generator that sends user prompts to a chat completion endpoint and renders the model’s output. Using a client library (instead of curl) helps keep your app code clean and portable across local development and hosted providers.Documentation Index
Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.

What you’ll need
- Python 3.8+ installed
- Ollama running locally for local development (optional if you target OpenAI hosted APIs)
- Familiarity with virtual environments and Flask
- OpenAI Python client docs
- Ollama docs (for running a local model server)
- Flask documentation
Project setup
- Create a project folder, for example:
- Create and activate a virtual environment:
- Install dependencies:
- Open your editor and create
server.pyat the project root.

server.py — full implementation
This example exposes a single route (”/”): GET renders a small form, POST sends the prompt to the chat completions endpoint using the OpenAI Python client and displays the returned poem.Configuration — environment variables
Store runtime configuration in a.env file at the project root. This keeps credentials and endpoints out of your code and makes switching between local Ollama and OpenAI hosted APIs straightforward.
| Variable | Purpose | Example |
|---|---|---|
| OPENAI_API_KEY | API key used by the OpenAI client. For local Ollama testing this can be any value. Replace with a real key when using OpenAI hosted APIs. | kodekloud |
| LLM_ENDPOINT | Base URL for the LLM endpoint. For Ollama local server use http://localhost:11434/v1 | http://localhost:11434/v1 |
| MODEL | Model identifier to call (configurable without code changes). | llama3.2 |
.env contents:
How the app works (high-level)
- The HTML form posts the user’s prompt to ”/”.
- The Flask route builds a
messagesarray: a system message to define role plus the user’s message. - The OpenAI Python client sends that to the chat completions endpoint (
client.chat.completions.create(...)) and returns a response object. - We extract the model output at
response.choices[0].message.contentand render it inside the page.
Start Ollama’s REST API (local)
Ensure the Ollama local service is running so the OpenAI client can reach it. Typical local start command:localhost:11434.
Representative Ollama server log when started:
output
Run the Flask app
With your virtualenv active and.env in place:

Logs you will see
- Flask logs GET and POST requests to
/. - Ollama logs incoming requests to
/v1/chat/completions.
Next steps and production considerations
- This demo illustrates a minimal integration pattern. To prepare for production:
- Use a production WSGI server (gunicorn/uvicorn) instead of Flask’s dev server.
- Store secrets securely (e.g., a secret manager or environment variable service).
- Add robust error handling, rate limiting, and input validation/sanitization.
- Cache or paginate long-running requests and handle model timeouts gracefully.
- When switching to OpenAI hosted APIs, change the
LLM_ENDPOINTand set a validOPENAI_API_KEY.
This is a simple demo intended for local development. For production, use a production WSGI server (gunicorn/uvicorn), secure environment secrets properly, and follow best practices for rate-limiting, error handling, and user input sanitization.