Introduction to OpenAI

Text Generation

Project 3 AI Research Assistant

Learn how to analyze CSV datasets with pandas and the OpenAI API to extract structured, point-form insights.

Prerequisites

RequirementVersion / Link
Python3.7+
OpenAI API Keyhttps://platform.openai.com/account/api-keys
pandashttps://pandas.pydata.org/
OpenAI SDKhttps://pypi.org/project/openai/

Installation

Install both pandas and the OpenAI Python client:

pip install pandas openai

Note

If you already have these packages installed, pip will confirm that the requirements are satisfied.

Configuration

Import the necessary modules and initialize your OpenAI client.
Warning: Never commit your API key to version control.

from openai import OpenAI
import pandas as pd

client = OpenAI(api_key="YOUR_API_KEY")

Replace "YOUR_API_KEY" with your actual key or load it from an environment variable.

Loading Your CSV Dataset

Download a CSV (for example, from Kaggle) and load it into a pandas DataFrame:

df = pd.read_csv("/path/to/your/user_behavior_dataset.csv")

Adjust the file path to match your local environment.

Defining the Analysis Function

This function converts the DataFrame to CSV text, invokes the GPT-4 model, and returns the AI-generated insights:

def analyze_data(df):
    csv_content = df.to_csv(index=False)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": (
                    "You are a research assistant. "
                    "Provide key insights from the following dataset in bullet points:\n"
                    f"{csv_content}"
                )
            }
        ],
        max_tokens=500,
        temperature=0.2
    )
    return response.choices[0].message.content

API Call Parameters

ParameterDescriptionExample
modelOpenAI model to use"gpt-4"
max_tokensMaximum tokens in the response500
temperatureControls randomness; lower is more deterministic0.2

Running the Assistant

Invoke the function and print the summary:

if __name__ == "__main__":
    summary = analyze_data(df)
    print(summary)

You’ll see a concise, bullet-pointed list of insights extracted from your dataset.

Focusing on Demographics

To target only demographic columns (e.g., age, gender, country), filter before sending:

def analyze_demographics(df):
    demo_df = df[['age', 'gender', 'country']]
    csv_content = demo_df.to_csv(index=False)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": (
                    "You are a research assistant. "
                    "Provide key demographic insights from the following data in bullet points:\n"
                    f"{csv_content}"
                )
            }
        ],
        max_tokens=300,
        temperature=0.2
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    demographics_summary = analyze_demographics(df)
    print(demographics_summary)

This returns focused insights on age distribution, gender breakdown, and geographic diversity.

Conclusion

You’ve now built an AI research assistant that:

  1. Installs and imports pandas & OpenAI SDK.
  2. Loads a CSV into a DataFrame.
  3. Sends your data to GPT-4.
  4. Returns structured, point-form insights.

Feel free to tweak prompts, adjust parameters, or analyze other subsets of your data.

Watch Video

Watch video content

Previous
Creating an Assistant