Demo Performing Text Processing and Analysis

We’re starting with a fresh Jupyter notebook to demonstrate common text-processing tasks using a chat-based LLM: summarization, bullets, sentiment analysis, translation, and converting plain text into structured formats. These small utilities show practical patterns for building reproducible prompts and integrating model outputs into downstream workflows.

Overview: tasks and examples

Task	Goal	Example prompt type
Summarization	Produce concise or length-limited summaries	500-word summary, bullet points
Sentiment analysis	Classify text as Positive / Negative / Neutral	Label customer reviews
Translation (tone-preserving)	Translate while keeping poetic or rhetorical tone	French poem → English poetic rendering
Format conversion	Convert semi-structured text to JSON / XML / JSONL	States → structured records

Setup

Load your API key from an environment variable and define a helper that wraps the ChatCompletion API. Keep API keys out of source code and follow your organization’s secret management policies.

Store sensitive credentials (like OPENAI_API_KEY) in environment variables or a secrets manager. Avoid hard-coding keys in notebooks.

# In [1]
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
def get_word_completion(prompt: str, model: str = "gpt-3.5-turbo") -> str:
    """
    Send a chat-style prompt and return the assistant's content string.
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt},
    ]
    response = openai.ChatCompletion.create(model=model, messages=messages)
    return response.choices[0].message.content

Best practices

Separate context (source text) from instructions (the prompt). This makes prompts reusable and easier to test.
Delimit large contexts (e.g., using triple backticks) so the model can clearly distinguish input data from the instruction.
When requesting structured outputs (JSON, XML, CSV), be explicit about the required schema to minimize parsing errors.

When embedding large context into prompts, delimit it (for example with triple backticks) so the model can clearly distinguish the source content from the instruction.

Summarization

Keep the source text and the instruction separate. Here’s an excerpt from Steve Jobs’ 2005 Stanford commencement address. We ask for a 500-word summary, then show how to request a bullet-point summary for scannability.

# In [2]: Context (excerpt)
context = '''
Steve Jobs' 2005 Stanford Commencement Address
I am honored to be with you today at your commencement from one of the finest universities in the world. I never gradu

The first story is about connecting the dots.

I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so
'''

500-word summary prompt and invocation:

# In [3]: Prompt asking for a 500-word summary
prompt = f"""
Create a summary capturing the main points and key details in 500 words based on the content delimited by triple backticks.

```{context}```
"""
response = get_word_completion(prompt)
print(response)

Bullet summary (quick, scannable):

# In [4]: Bullet summary prompt
prompt_bullets = f"""
Create a summary capturing the main points and key details as bullets based on the content delimited by triple backticks.

```{context}```
"""
response_bullets = get_word_completion(prompt_bullets)
print(response_bullets)

Sample bullet-form output (example):

Steve Jobs delivered a commencement address at Stanford University in 2005 and shared three stories from his life.
First story: connecting the dots — dropping out led him to learn calligraphy, which later influenced Macintosh design.
Second story: love and loss — getting fired from Apple enabled him to start anew (NeXT, Pixar) and eventually return.
Third story: death — facing mortality focused his priorities; follow your intuition and live authentically.
Closing advice: “Stay Hungry. Stay Foolish.” — remain curious and brave in pursuing your work.

Sentiment analysis

Use the same structure: pass the text as context, then instruct the model how to label each item. This pattern is useful for generating labeled datasets for downstream model training or analysis.

# In [5]: Sentiment analysis context
context_reviews = '''
1. If you sometimes like to go to the movies to have fun, Wasabi is a good place to start.
2. An idealistic love story that brings out the latent 15-year-old romantic in everyone.
3. The story loses its bite in a last-minute happy ending that's even less plausible than the rest of the picture.
'''
prompt_sentiment = f"""
Analyze the sentiment of the reviews delimited in triple backticks.

First show the actual review and then add the sentiment - Positive, Negative, or Neutral.

```{context_reviews}```
"""
response_sentiment = get_word_completion(prompt_sentiment)
print(response_sentiment)

Expected output example:

If you sometimes like to go to the movies to have fun, Wasabi is a good place to start. Sentiment: Positive
An idealistic love story that brings out the latent 15-year-old romantic in everyone. Sentiment: Positive
The story loses its bite in a last-minute happy ending that’s even less plausible than the rest of the picture. Sentiment: Negative

Note: LLMs can be used to generate labeled data (for example, labeling customer reviews or social media posts) which you can then use for downstream analysis or to train supervised models. Translation (poetic translation)

LLMs can translate and preserve tone. Provide the poem as context and request a tone-preserving English rendering.

# In [6]: Poem translation context
context_poem = """
Demain, dès l'aube, à l'heure où blanchit la campagne,
Je partirai. Vois-tu, je sais que tu m'attends.
J'irai par la forêt, j'irai par la montagne.
Je ne puis demeurer loin de toi plus longtemps.
Je marcherai les yeux fixés sur mes pensées,
Sans rien voir au dehors, sans entendre aucun bruit,
Seul, inconnu, le dos courbé, les mains croisées,
Triste, et le jour pour moi sera comme la nuit.
Je ne regarderai ni l'or du soir qui tombe,
Ni les voiles au loin descendant vers Harfleur,
Et quand j'arriverai, je mettrai sur ta tombe
Un bouquet de houx vert et de bruyère en fleur.
"""
prompt_translate = f"""
Write an English poem based on the French poem delimited in triple backticks.

```{context_poem}```
"""
response_translate = get_word_completion(prompt_translate)
print(response_translate)

Sample poetic translation (example): Tomorrow, at dawn’s early light, I shall depart, for I know you await. Through forest and mountain, I’ll take flight, For I cannot bear this distance, this weight. With eyes fixed on my thoughts, I’ll tread, Unseeing of the world, deaf to its sound. Alone, unknown, stooped with a heavy head, Gloomy, for me, day will be night unbound. I’ll not gaze upon the evening’s golden hue, Nor watch distant sails descend to Harfleur. And when I arrive, a bouquet I’ll bestrew, Of green holly and blooming heather pure. And there, upon your grave, my tribute laid, I’ll feel your presence, though you’ve been away. Format conversion (plain text → JSON / XML / JSONL)

Convert semi-structured plain text into structured formats for ingestion into pipelines and databases. Be explicit about the desired output schema (keys, types) to reduce ambiguity.

# In [7]: States and capitals context
context_states = """
1. Alabama - Montgomery
2. California - Sacramento
3. Florida - Tallahassee
4. Georgia - Atlanta
5. Illinois - Springfield
6. Massachusetts - Boston
7. New York - Albany
8. Texas - Austin
9. Pennsylvania - Harrisburg
10. Washington - Olympia
"""
prompt_formats = f"""
From the content delimited in triple backticks, format it in JSON, XML, and JSONL.
```{context_states}```
"""
response_formats = get_word_completion(prompt_formats)
print(response_formats)

Example model-converted outputs (examples): JSON:

[
  {
    "state": "Alabama",
    "capital": "Montgomery"
  },
  {
    "state": "California",
    "capital": "Sacramento"
  },
  ...
]

XML:

<root>
  <state>
    <name>Alabama</name>
    <capital>Montgomery</capital>
  </state>
  <state>
    <name>California</name>
    <capital>Sacramento</capital>
  </state>
  ...
</root>

JSONL (JSON Lines):

{"state": "Alabama", "capital": "Montgomery"}
{"state": "California", "capital": "Sacramento"}
{"state": "Florida", "capital": "Tallahassee"}
{"state": "Georgia", "capital": "Atlanta"}
{"state": "Illinois", "capital": "Springfield"}
{"state": "Massachusetts", "capital": "Boston"}
{"state": "New York", "capital": "Albany"}
{"state": "Texas", "capital": "Austin"}
{"state": "Pennsylvania", "capital": "Harrisburg"}
{"state": "Washington", "capital": "Olympia"}

Note: JSONL (also called “newline-delimited JSON” or “NDJSON”) is not the same as JSON-LD. JSONL means each line is a separate valid JSON object — convenient for streaming and line-by-line processing. Summary

What we covered:
- Summarization: fixed-length summaries and bullet-style output, emphasizing separation of context and prompt.
- Sentiment analysis: labeling text as Positive / Negative / Neutral for downstream use.
- Translation: preserving tone (poetic translation example).
- Format conversion: converting semi-structured text into JSON, XML, and JSONL for pipelines.

Demo Performing Text Processing and Analysis

Watch Video

Practice Lab