Introduction to OpenAI
Text Generation
Project 3 AI Research Assistant
Learn how to analyze CSV datasets with pandas and the OpenAI API to extract structured, point-form insights.
Prerequisites
Requirement | Version / Link |
---|---|
Python | 3.7+ |
OpenAI API Key | https://platform.openai.com/account/api-keys |
pandas | https://pandas.pydata.org/ |
OpenAI SDK | https://pypi.org/project/openai/ |
Installation
Install both pandas and the OpenAI Python client:
pip install pandas openai
Note
If you already have these packages installed, pip will confirm that the requirements are satisfied.
Configuration
Import the necessary modules and initialize your OpenAI client.
Warning: Never commit your API key to version control.
from openai import OpenAI
import pandas as pd
client = OpenAI(api_key="YOUR_API_KEY")
Replace "YOUR_API_KEY"
with your actual key or load it from an environment variable.
Loading Your CSV Dataset
Download a CSV (for example, from Kaggle) and load it into a pandas DataFrame:
df = pd.read_csv("/path/to/your/user_behavior_dataset.csv")
Adjust the file path to match your local environment.
Defining the Analysis Function
This function converts the DataFrame to CSV text, invokes the GPT-4 model, and returns the AI-generated insights:
def analyze_data(df):
csv_content = df.to_csv(index=False)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": (
"You are a research assistant. "
"Provide key insights from the following dataset in bullet points:\n"
f"{csv_content}"
)
}
],
max_tokens=500,
temperature=0.2
)
return response.choices[0].message.content
API Call Parameters
Parameter | Description | Example |
---|---|---|
model | OpenAI model to use | "gpt-4" |
max_tokens | Maximum tokens in the response | 500 |
temperature | Controls randomness; lower is more deterministic | 0.2 |
Running the Assistant
Invoke the function and print the summary:
if __name__ == "__main__":
summary = analyze_data(df)
print(summary)
You’ll see a concise, bullet-pointed list of insights extracted from your dataset.
Focusing on Demographics
To target only demographic columns (e.g., age, gender, country), filter before sending:
def analyze_demographics(df):
demo_df = df[['age', 'gender', 'country']]
csv_content = demo_df.to_csv(index=False)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": (
"You are a research assistant. "
"Provide key demographic insights from the following data in bullet points:\n"
f"{csv_content}"
)
}
],
max_tokens=300,
temperature=0.2
)
return response.choices[0].message.content
if __name__ == "__main__":
demographics_summary = analyze_demographics(df)
print(demographics_summary)
This returns focused insights on age distribution, gender breakdown, and geographic diversity.
Conclusion
You’ve now built an AI research assistant that:
- Installs and imports pandas & OpenAI SDK.
- Loads a CSV into a DataFrame.
- Sends your data to GPT-4.
- Returns structured, point-form insights.
Feel free to tweak prompts, adjust parameters, or analyze other subsets of your data.
Links & References
Watch Video
Watch video content