Learn how to analyze CSV datasets with pandas and the OpenAI API to extract structured, point-form insights.
Prerequisites
Installation
Install both pandas and the OpenAI Python client:
pip install pandas openai
If you already have these packages installed, pip will confirm that the requirements are satisfied.
Configuration
Import the necessary modules and initialize your OpenAI client.
Warning: Never commit your API key to version control.
from openai import OpenAI
import pandas as pd
client = OpenAI(api_key="YOUR_API_KEY")
Replace "YOUR_API_KEY" with your actual key or load it from an environment variable.
Loading Your CSV Dataset
Download a CSV (for example, from Kaggle) and load it into a pandas DataFrame:
df = pd.read_csv("/path/to/your/user_behavior_dataset.csv")
Adjust the file path to match your local environment.
Defining the Analysis Function
This function converts the DataFrame to CSV text, invokes the GPT-4 model, and returns the AI-generated insights:
def analyze_data(df):
csv_content = df.to_csv(index=False)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": (
"You are a research assistant. "
"Provide key insights from the following dataset in bullet points:\n"
f"{csv_content}"
)
}
],
max_tokens=500,
temperature=0.2
)
return response.choices[0].message.content
API Call Parameters
| Parameter | Description | Example |
|---|
| model | OpenAI model to use | "gpt-4" |
| max_tokens | Maximum tokens in the response | 500 |
| temperature | Controls randomness; lower is more deterministic | 0.2 |
Running the Assistant
Invoke the function and print the summary:
if __name__ == "__main__":
summary = analyze_data(df)
print(summary)
You’ll see a concise, bullet-pointed list of insights extracted from your dataset.
Focusing on Demographics
To target only demographic columns (e.g., age, gender, country), filter before sending:
def analyze_demographics(df):
demo_df = df[['age', 'gender', 'country']]
csv_content = demo_df.to_csv(index=False)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": (
"You are a research assistant. "
"Provide key demographic insights from the following data in bullet points:\n"
f"{csv_content}"
)
}
],
max_tokens=300,
temperature=0.2
)
return response.choices[0].message.content
if __name__ == "__main__":
demographics_summary = analyze_demographics(df)
print(demographics_summary)
This returns focused insights on age distribution, gender breakdown, and geographic diversity.
Conclusion
You’ve now built an AI research assistant that:
- Installs and imports pandas & OpenAI SDK.
- Loads a CSV into a DataFrame.
- Sends your data to GPT-4.
- Returns structured, point-form insights.
Feel free to tweak prompts, adjust parameters, or analyze other subsets of your data.
Links & References