What are Tokens

In this lesson we’ll explain tokens—the atomic units that large language models (LLMs) use to represent text—and show how tokenization, context limits, and token costs affect your applications. Machine learning models do not operate on characters, words, or sentences directly. Instead, they process numerical representations called tokens. When you send a prompt, it is first converted into tokens. The model processes those tokens and generates output tokens, which are then decoded back to text to form the response (for example, the output you see from ChatGPT). Put simply: tokens are the fundamental units of text for models like GPT‑3 and ChatGPT.

A colorful flowchart titled "How Do LLMs Function?" showing a prompt turned into tokens, processed by an AI model, and then converted back into tokens to generate a response.

How tokenization works

Tokenization is the process of splitting text into tokens that the model can process. A token might represent:

A whole word (rare for many languages)
A character (common in some languages or tokenizers)
A subword or byte-pair encoded piece (common for OpenAI models)

Most modern OpenAI tokenizers use a subword (BPE-like) scheme, so many tokens are partial words. Tokenization depends on the encoding used by the specific model.

Token size: characters, words, and tokens (rough guidelines)

There is no fixed 1:1 mapping between tokens and words, but a useful approximation for English is:

Measure	Approximate mapping
1 token	~4 characters
1 token	~0.75 words
100 tokens	~75 words
~2,000 tokens	~1,500 words (roughly)

So 100 tokens is approximately 75 words, and a 1,500‑word document often maps to ~2,000 tokens. Exact counts vary by text and encoding.

A slide titled "Tokenization" explaining how OpenAI models (e.g., GPT-3) break input into tokens, with examples: 1 token ≈ 4 characters, 100 tokens ≈ 75 words, and ~1,500 words ≈ 2,048 tokens. It also notes that the API breaks prompts into tokens before processing.

Tools to inspect tokenization

Two practical ways to explore how text maps to tokens:

The interactive OpenAI Tokenizer: https://platform.openai.com/tokenizer — paste text and see tokens/token IDs and counts.
The open-source library tiktoken: https://github.com/openai/tiktoken — programmatic encoding/decoding in Python.

A slide titled "Tools to Explore Tokens" showing two screenshots: OpenAI's online Tokenizer tool on the left (https://platform.openai.com/tokenizer) and the tiktoken GitHub repository on the right (https://github.com/openai/tiktoken). The images highlight tokenized text, token counts, and repository files.

Always use the tokenizer/encoding associated with the model you are calling (for example, use the encoding for “gpt-3.5-turbo” when calling that model). Token counts and token IDs differ across encodings.

Encoding and decoding with tiktoken (example)

Encoding converts text to token IDs; decoding converts token IDs back to text. tiktoken implements an efficient BPE-style tokenizer used by OpenAI models and lets you inspect tokens programmatically. Install tiktoken:

pip install tiktoken

Example Python usage:

import tiktoken

# Use the tokenizer corresponding to the model you will call
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

msg = "If it is 9AM in London, what time is it in Hyderabad. Be concise."

# Encode the message into token IDs
tokens = encoding.encode(msg)
print(tokens)

# Number of tokens
print(len(tokens))

# Decode back to text
print(encoding.decode(tokens))

Interactive example (output may vary with encoding version):

In [1]: import tiktoken

In [2]: encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [3]: msg = "If it is 9AM in London, what time is it in Hyderabad. Be concise."

In [4]: tokens = encoding.encode(msg)
In [5]: tokens
Out[5]: [2746, 433, 374, 220, 24, 1428, 304, 7295, 11, 1148, 892, 374, 433, 304, 69547, 13, 2893, 64694, 13]

In [6]: len(tokens)
Out[6]: 19

In [7]: print(encoding.decode(tokens))
If it is 9AM in London, what time is it in Hyderabad. Be concise.

This shows the round trip: text → token IDs → text, and the token count.

Context windows and token limits

Each model has a maximum context length (context window), measured in tokens. The context window includes both input tokens (your prompt) and output tokens (the model’s completion). When planning requests, add prompt tokens + expected output tokens and ensure they stay within the model’s context limit.

Topic	Guidance
Context length	Varies by model (e.g., historically ~4,097 tokens for some models).
Planning	If your prompt consumes N tokens and the context limit is L, then available completion tokens ≈ L − N.
Strategy	For large inputs, split text across multiple calls (chaining) or summarize to fit the window.

Example: if the model’s limit is 4,097 tokens and your prompt uses 3,900 tokens, you have ~197 tokens left for the model’s output. If your prompt uses the full limit, there is no room for completion.

A slide titled "Token Limits" that states OpenAI models have a maximum token size of 4,097. It notes that a 4,000-token prompt leaves up to 97 tokens for the completion and suggests breaking prompts into parts to work around the limit.

Be mindful of the combined token usage (input + output). Exceeding a model’s context window will cause errors or truncated responses. If you need more context than the model allows, consider summarization, retrieval-augmented generation (RAG), or splitting inputs across multiple calls.

Tokens and cost

OpenAI pricing is typically based on tokens processed (input + output). Keep a token budget for your application to:

Ensure the model has sufficient context to produce accurate outputs.
Control costs by reducing unnecessary tokens in prompts and responses.
Choose an appropriate model for your use case (higher-capacity models often cost more per token).

Example (historical, subject to change): GPT‑3.5 Turbo was priced near $0.002 per 1,000 tokens. Always check the official pricing page to estimate costs for your usage.

A presentation slide titled "Impact of Token Size on Cost" showing an example that GPT‑3.5 Turbo costs $0.002 per 1,000 tokens. It also notes you can optimize cost by selecting the right model, reducing token size, and setting usage limits.

Quick practical example: Try the tokenizer

Try the OpenAI Tokenizer: https://platform.openai.com/tokenizer Paste this prompt into the tokenizer: If it is 9AM in London, what time is it in Hyderabad. Be concise.

Character count: 65 characters
Token count (tool example): 19 tokens
Word count: 14 words

This illustrates that tokens and words do not map 1:1.

Summary

Tokens are the basic units LLMs use to represent text.
Tokenization splits text into tokens; encodings differ by model, so use the correct tokenizer.
Context windows are measured in tokens and include both input and output—plan accordingly.
Tokens determine cost—monitor and budget token usage.
Use the OpenAI Tokenizer (web) or tiktoken (programmatic) to inspect tokens for your prompts.

References and further reading:

OpenAI Tokenizer: https://platform.openai.com/tokenizer
tiktoken GitHub: https://github.com/openai/tiktoken
Kubernetes Documentation (example external reference)

Introduction

What is Generative AI

Getting Started with OpenAI

Exploring ChatGPT

Understanding Prompt Engineering

Understanding Tokens and API Parameters

Implementing Word Completion

Implementing Code Completion

Building an Interactive Chatbot

Performing Text Processing and Analysis

Using Word Embeddings For Dynamic Context

Fine-tuning GPT-3 with a Custom Dataset

Generating Images

Audio Transcription & Translation

Moderating Prompts with Moderating API

Summary and Next Steps

How tokenization works

Token size: characters, words, and tokens (rough guidelines)

Tools to inspect tokenization

Encoding and decoding with tiktoken (example)

Context windows and token limits

Tokens and cost

Quick practical example: Try the tokenizer

Summary

Watch Video

Introduction

What is Generative AI

Getting Started with OpenAI

Exploring ChatGPT

Understanding Prompt Engineering

Understanding Tokens and API Parameters

Implementing Word Completion

Implementing Code Completion

Building an Interactive Chatbot

Performing Text Processing and Analysis

Using Word Embeddings For Dynamic Context

Fine-tuning GPT-3 with a Custom Dataset

Generating Images

Audio Transcription & Translation

Moderating Prompts with Moderating API

Summary and Next Steps

Documentation Index

​How tokenization works

​Token size: characters, words, and tokens (rough guidelines)

​Tools to inspect tokenization

​Encoding and decoding with tiktoken (example)

​Context windows and token limits

​Tokens and cost

​Quick practical example: Try the tokenizer

​Summary

Watch Video

How tokenization works

Token size: characters, words, and tokens (rough guidelines)

Tools to inspect tokenization

Encoding and decoding with tiktoken (example)

Context windows and token limits

Tokens and cost

Quick practical example: Try the tokenizer

Summary