Skip to main content
Learn how to analyze CSV datasets with pandas and the OpenAI API to extract structured, point-form insights.

Prerequisites

Installation

Install both pandas and the OpenAI Python client:
pip install pandas openai
If you already have these packages installed, pip will confirm that the requirements are satisfied.

Configuration

Import the necessary modules and initialize your OpenAI client.
Warning: Never commit your API key to version control.
from openai import OpenAI
import pandas as pd

client = OpenAI(api_key="YOUR_API_KEY")
Replace "YOUR_API_KEY" with your actual key or load it from an environment variable.

Loading Your CSV Dataset

Download a CSV (for example, from Kaggle) and load it into a pandas DataFrame:
df = pd.read_csv("/path/to/your/user_behavior_dataset.csv")
Adjust the file path to match your local environment.

Defining the Analysis Function

This function converts the DataFrame to CSV text, invokes the GPT-4 model, and returns the AI-generated insights:
def analyze_data(df):
    csv_content = df.to_csv(index=False)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": (
                    "You are a research assistant. "
                    "Provide key insights from the following dataset in bullet points:\n"
                    f"{csv_content}"
                )
            }
        ],
        max_tokens=500,
        temperature=0.2
    )
    return response.choices[0].message.content

API Call Parameters

ParameterDescriptionExample
modelOpenAI model to use"gpt-4"
max_tokensMaximum tokens in the response500
temperatureControls randomness; lower is more deterministic0.2

Running the Assistant

Invoke the function and print the summary:
if __name__ == "__main__":
    summary = analyze_data(df)
    print(summary)
You’ll see a concise, bullet-pointed list of insights extracted from your dataset.

Focusing on Demographics

To target only demographic columns (e.g., age, gender, country), filter before sending:
def analyze_demographics(df):
    demo_df = df[['age', 'gender', 'country']]
    csv_content = demo_df.to_csv(index=False)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": (
                    "You are a research assistant. "
                    "Provide key demographic insights from the following data in bullet points:\n"
                    f"{csv_content}"
                )
            }
        ],
        max_tokens=300,
        temperature=0.2
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    demographics_summary = analyze_demographics(df)
    print(demographics_summary)
This returns focused insights on age distribution, gender breakdown, and geographic diversity.

Conclusion

You’ve now built an AI research assistant that:
  1. Installs and imports pandas & OpenAI SDK.
  2. Loads a CSV into a DataFrame.
  3. Sends your data to GPT-4.
  4. Returns structured, point-form insights.
Feel free to tweak prompts, adjust parameters, or analyze other subsets of your data.