Skip to main content
Welcome back. In this guide we’ll build a smart resume-screener agent in Python that:
  • Extracts the most relevant keywords from a job description using an OpenAI GPT model,
  • Scans all PDF resumes in a folder for those keywords, and
  • Reports which resumes mention those keywords (including file name, matched keyword, matching line, and page number).
This workflow is useful for recruiters, hiring managers, and automated HR screening pipelines. Below is a cleaned, consolidated, and better-organized version of the code with step-by-step explanation.
Before running the code, install PyMuPDF (fitz) and any other dependencies you need. For example:
pip install pymupdf python-dotenv openai
See the packages: PyMuPDF, python-dotenv, and the OpenAI Python client. Also set your OpenAI API key in an environment variable (for example, in a .env file): OPENAI_API_KEY=your_key_here. See OpenAI’s API key docs: https://platform.openai.com/docs/api-keys.
Dependencies at a glance:
PackagePurposeInstall
PyMuPDF (fitz)Read and extract text from PDF filespip install pymupdf
python-dotenvLoad environment variables from a .env filepip install python-dotenv
OpenAI Python clientCall OpenAI APIs for keyword extractionpip install openai

1) Load environment variables and imports

Start by loading environment variables and importing required modules. This section sets up dotenv and common utilities.
from dotenv import load_dotenv
import os
import re
import asyncio
from pathlib import Path

load_dotenv()  # Loads environment variables from .env into os.environ
Then import the agent and OpenAI client libraries and PyMuPDF. These imports provide the agent runtime, the OpenAI client, and PDF parsing.
from agents import Agent, Runner, ModelSettings
from agents.tool import function_tool
from openai import OpenAI
import fitz  # PyMuPDF
Create the OpenAI client and set the directory that contains your resumes. Replace the path with your local folder of PDFs:
client = OpenAI()
RESUME_DIR = Path("/Users/your_user/Path/To/Resumes")  # <-- change this to your folder
Security tip: Keep your OPENAI_API_KEY and any other secrets out of your repository (use .env or your platform’s secret manager).

2) Define the PDF resume scanning tool

We expose a tool the agent can call: it opens each PDF in the folder, iterates pages and lines, and records matches for any of the provided keywords.
@function_tool(name_override="scan_resumes_for_keywords")
def scan_resumes_for_keywords(keywords: list[str]) -> list[dict]:
    """
    Scans all PDF files in RESUME_DIR for occurrences of any keyword in `keywords`.
    Returns a list of dicts with keys: filename, keyword, line, page.
    """
    results: list[dict] = []

    # Lower-case keywords for case-insensitive matching
    lowered_keywords = [kw.lower() for kw in keywords]

    for file in RESUME_DIR.glob("*.pdf"):
        try:
            doc = fitz.open(file)
        except Exception as e:
            # If a PDF cannot be opened, skip it (could log the error)
            continue

        for page in doc:
            text = page.get_text() or ""
            lines = text.splitlines()
            for line in lines:
                line_lower = line.lower()
                for kw in lowered_keywords:
                    if kw and kw in line_lower:
                        results.append({
                            "filename": file.name,
                            "keyword": kw,
                            "line": line.strip(),
                            "page": page.number + 1
                        })

        doc.close()
    return results
Notes:
  • Only *.pdf files are scanned.
  • Matching is case-insensitive and performed per line. This keeps context (the line and page) for every hit.
  • Each result contains filename, keyword, line, and a 1-based page number.

3) Extract keywords from a job description using OpenAI

Define a function that calls the OpenAI chat completion endpoint to extract a list of the most important skills/technologies from the job description. The function normalizes and cleans numbered or bulleted lists returned by the model.
def extract_keywords_from_job_description(job_text: str, n_keywords: int = 15) -> list[str]:
    """
    Uses the OpenAI chat completions API (https://platform.openai.com/docs/guides/chat) to extract important skills/keywords
    from the provided job_text. Returns up to n_keywords items.
    """
    system_msg = (
        "You are a job recruiter. Extract the 10–15 most important skills, "
        "technologies, and keywords from the job description. Output them as a "
        "simple list (one per line or comma-separated)."
    )

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": job_text}
        ],
        temperature=0.3
    )

    # Parse raw content and clean list items
    raw_output = response.choices[0].message.content or ""
    keywords: list[str] = []
    for line in raw_output.splitlines():
        # remove bullet/number prefixes like "1.", "-", "*", "•", "2)"
        cleaned = re.sub(r'^[\s\-\*\.\d\)\•]+', '', line).strip()
        if cleaned:
            # If the line contains multiple comma-separated items, split them too
            if "," in cleaned and len(cleaned.split(",")) > 1:
                for part in cleaned.split(","):
                    part_clean = part.strip()
                    if part_clean:
                        keywords.append(part_clean)
            else:
                keywords.append(cleaned)

    # Fallback: if the model returned a single-line comma-separated output
    if not keywords:
        for part in raw_output.split(","):
            part_clean = part.strip()
            if part_clean:
                keywords.append(part_clean)

    return keywords[:n_keywords]
Tips:
  • Keep temperature low (e.g., 0.2–0.4) to improve determinism for extraction tasks.
  • Consider adjusting the system prompt to include domain-specific terms or required exclusions.

4) Job description and running the agent

Use a job description (replace with one from your interface or user input), extract keywords, then create the agent and run the scan.
JOB_DESCRIPTION = """We're looking for a data scientist with experience in Python, machine learning,
data visualization, working with large datasets, SQL, version control (Git), and production deployment.
Experience with NumPy, pandas, and deploying models to cloud platforms is a plus."""

# Extract keywords from the job description
extracted_keywords = extract_keywords_from_job_description(JOB_DESCRIPTION)
print("\nExtracted Keywords:\n", extracted_keywords)
Create the agent and run it. The agent uses the scan_resumes_for_keywords tool defined earlier.
agent = Agent(
    name="Resume Matcher",
    instructions=(
        "You are a resume scanner. The user will give you a job description. "
        "First extract the keywords, then use the tool `scan_resumes_for_keywords` "
        "to scan the resumes and report which resumes mention which keywords "
        "(include filename, matched keyword, matching line, and page number)."
    ),
    tools=[scan_resumes_for_keywords],
    model="gpt-4",
    model_settings=ModelSettings(truncation="auto"),
)

prompt = (
    f"Scan the resumes for keywords that match this posting:\n\n{JOB_DESCRIPTION}\n\n"
    f"The extracted keywords are: {', '.join(extracted_keywords)}"
)

# Run the agent using Runner. Runner.run is async, so use asyncio.run to execute it.
if __name__ == "__main__":
    result = asyncio.run(Runner.run(agent, prompt))
    print("\nResume Scan Results:\n")
    print(result)
Operational notes:
  • Runner.run is asynchronous; the asyncio.run(...) wrapper is appropriate for simple scripts.
  • For long-running or production use, integrate into an async event loop or background worker.

5) Example output

A sample (cleaned) output might look like:
Resume Scan Results:
RunResult:
- Last agent: Agent(name="Resume Matcher", ...)
- Final output (str):
Here are the results of scanning the resumes for the extracted keywords:

### fake_resume_john_doe.pdf
- Python: Programming Languages: Python, Java, C++ (Page 1)
- Python: Implemented scalable Python microservices for data ingestion pipelines (Page 1)
- Machine Learning: Designed and deployed ML models to improve recommendations (Page 1)
- Git: Tools: Git, Docker, Kubernetes (Page 1)

### resume_1.pdf
- Python: Python, JavaScript, React, Node.js, Docker (Page 1)

### resume_2.pdf
- Python: Python, R, Machine Learning, TensorFlow, SQL (Page 1)
- SQL: Experience with SQL for analytics and pipelines (Page 2)

### resume_3.pdf
- AWS: AWS, Docker, Kubernetes, CI/CD (Page 1)

The identified resumes contain keywords related to Python, Machine Learning, Git, SQL, and cloud tooling.

Next steps / improvements

  • Rank resumes by the number of matched keywords or weighted importance.
  • Export results to CSV, JSON, or store in a database for later analysis.
  • Add a web UI or email summary for recruiters.
  • Improve keyword extraction with domain-specific prompts, stop-words, or normalization (e.g., treat “ML” and “Machine Learning” as synonyms).
  • Add fuzzy matching or synonym expansion using libraries like fuzzywuzzy/rapidfuzz or embedding similarity.
You now have a working agent-based resume screener that extracts relevant keywords from a job posting and scans PDF resumes for those keywords. Adjust the prompt, keyword limits, and matching logic to fit your organization’s hiring criteria.

Watch Video

Practice Lab