Understanding OpenAI API Parameters

Controlling the behavior of your chat completions starts with setting the right parameters. In this guide, you’ll learn how to influence creativity, length, repetition, and more—both programmatically and via the OpenAI Playground.

Quickstart Example

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "Hello!"}
    ]
)

print(completion.choices[0].message)

This snippet covers:

Importing libraries
Loading your API key
Calling the chat completion endpoint
Printing the assistant’s reply

While model and messages are the only required fields, the full API offers many more knobs.

Core Parameters

1. model and messages

model: Choose an engine (e.g. gpt-3.5-turbo, gpt-4).
messages: An ordered list of {role, content} objects. Roles include:

The image illustrates the three roles of chart completion models: System, Assistant, and User, each represented by a distinct icon.

Role	Purpose
system	Sets assistant behavior (e.g. `"You are a tutor."`)
user	Represents user questions or commands
assistant	Contains previous assistant responses

2. Randomness & Diversity: temperature vs. top_p

Both control how “creative” the output can be. Adjust one at a time for best results.

temperature (0–2, default 1):
- Lower → more focused, deterministic (e.g. 0–0.2)
- Higher → more creative, random (e.g. 0.8–2)
top_p (0–1, default 1):
- Also called “nucleus sampling.”
- Model only considers tokens whose cumulative probability ≤ top_p.

Note

If you need repeatable outputs, set temperature=0 and top_p=0. For open-ended tasks, start with temperature=0.7 and top_p=0.9.

3. n (number of completions)

Generate multiple replies in one API call. Default is n=1. Raising n can help you compare variations but increases token usage.

4. max_tokens

Limits the length of the completion. Remember:

prompt_tokens + max_tokens ≤ model context window (e.g. 4096 for gpt-3.5-turbo).

Warning

If you exceed the context window, the API will return an error. Always calculate prompt size before setting max_tokens.

5. presence_penalty vs. frequency_penalty

Prevent the model from repeating itself:

presence_penalty (–2.0 to 2.0): Encourages new topics by penalizing tokens that have appeared at all.
frequency_penalty (–2.0 to 2.0): Penalizes tokens proportionally to how often they’ve already appeared.

The image is a table outlining key parameters of a Completion API, including columns for parameter names, possible/default values, ranges, whether they are required, and descriptions.

Interactive Exploration with the Playground

Experiment with all of these settings in real time:

The image shows the OpenAI Playground interface, featuring sections for entering system and user messages, along with settings for model parameters like temperature and maximum length.

Head over to the OpenAI Playground to tweak parameters and see instant feedback.

Example: Deterministic vs. Creative Runs

Below is a side-by-side comparison showing how temperature and top_p affect output style:

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

p_model = "gpt-3.5-turbo"
p_messages = [
    {"role": "system", "content": "You are an assistant to write poems"},
    {"role": "user",   "content": "Write a short poem on the Taj Mahal in one stanza."}
]
p_n = 1
p_max_tokens = 100
p_presence_penalty = 0.0
p_frequency_penalty = 0.0

# Deterministic output
completion = openai.ChatCompletion.create(
    model=p_model,
    messages=p_messages,
    n=p_n,
    max_tokens=p_max_tokens,
    temperature=0.0,
    top_p=0.0,
    presence_penalty=p_presence_penalty,
    frequency_penalty=p_frequency_penalty,
)
print("Deterministic →", completion.choices[0].message.content)

# Creative output
completion = openai.ChatCompletion.create(
    model=p_model,
    messages=p_messages,
    n=p_n,
    max_tokens=p_max_tokens,
    temperature=1.5,
    top_p=1.0,
    presence_penalty=p_presence_penalty,
    frequency_penalty=p_frequency_penalty,
)
print("Creative →", completion.choices[0].message.content)

By comparing these runs, you’ll see the poem’s structure and word choice change dramatically.

Summary Table of Key Parameters

Parameter	Range	Default	Description
model	string	—	Model to use (`gpt-3.5-turbo`, `gpt-4`, etc.)
messages	list	—	Conversation history as `[{role, content}, …]`
temperature	0–2	1	Controls randomness
top_p	0–1	1	Nucleus sampling threshold
n	int	1	Number of completions
max_tokens	int	(model)	Max tokens in the completion
presence_penalty	–2.0 to 2.0	0	Penalize new tokens based on prior presence
frequency_penalty	–2.0 to 2.0	0	Penalize tokens based on prior frequency

Links and References

Next up: using these parameters to automate full blog-post generation—stay tuned!

Watch Video

Watch video content

Practice Lab

Practice lab