Project 3 AI Research Assistant

Learn how to analyze CSV datasets with pandas and the OpenAI API to extract structured, point-form insights.

Prerequisites

Requirement	Version / Link
Python	3.7+
OpenAI API Key	https://platform.openai.com/account/api-keys
pandas	https://pandas.pydata.org/
OpenAI SDK	https://pypi.org/project/openai/

Installation

Install both pandas and the OpenAI Python client:

pip install pandas openai

Note

If you already have these packages installed, pip will confirm that the requirements are satisfied.

Configuration

Import the necessary modules and initialize your OpenAI client.
Warning: Never commit your API key to version control.

from openai import OpenAI
import pandas as pd

client = OpenAI(api_key="YOUR_API_KEY")

Replace "YOUR_API_KEY" with your actual key or load it from an environment variable.

Loading Your CSV Dataset

Download a CSV (for example, from Kaggle) and load it into a pandas DataFrame:

df = pd.read_csv("/path/to/your/user_behavior_dataset.csv")

Adjust the file path to match your local environment.

Defining the Analysis Function

This function converts the DataFrame to CSV text, invokes the GPT-4 model, and returns the AI-generated insights:

def analyze_data(df):
    csv_content = df.to_csv(index=False)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": (
                    "You are a research assistant. "
                    "Provide key insights from the following dataset in bullet points:\n"
                    f"{csv_content}"
                )
            }
        ],
        max_tokens=500,
        temperature=0.2
    )
    return response.choices[0].message.content

API Call Parameters

Parameter	Description	Example
model	OpenAI model to use	`"gpt-4"`
max_tokens	Maximum tokens in the response	`500`
temperature	Controls randomness; lower is more deterministic	`0.2`

Running the Assistant

Invoke the function and print the summary:

if __name__ == "__main__":
    summary = analyze_data(df)
    print(summary)

You’ll see a concise, bullet-pointed list of insights extracted from your dataset.

Focusing on Demographics

To target only demographic columns (e.g., age, gender, country), filter before sending:

def analyze_demographics(df):
    demo_df = df[['age', 'gender', 'country']]
    csv_content = demo_df.to_csv(index=False)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": (
                    "You are a research assistant. "
                    "Provide key demographic insights from the following data in bullet points:\n"
                    f"{csv_content}"
                )
            }
        ],
        max_tokens=300,
        temperature=0.2
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    demographics_summary = analyze_demographics(df)
    print(demographics_summary)

This returns focused insights on age distribution, gender breakdown, and geographic diversity.

Conclusion

You’ve now built an AI research assistant that:

Installs and imports pandas & OpenAI SDK.
Loads a CSV into a DataFrame.
Sends your data to GPT-4.
Returns structured, point-form insights.

Feel free to tweak prompts, adjust parameters, or analyze other subsets of your data.

Links & References

Watch Video

Watch video content