- Extracts the most relevant keywords from a job description using an OpenAI GPT model,
- Scans all PDF resumes in a folder for those keywords, and
- Reports which resumes mention those keywords (including file name, matched keyword, matching line, and page number).
Before running the code, install PyMuPDF (fitz) and any other dependencies you need. For example:See the packages: PyMuPDF, python-dotenv, and the OpenAI Python client. Also set your OpenAI API key in an environment variable (for example, in a
.env file): OPENAI_API_KEY=your_key_here. See OpenAI’s API key docs: https://platform.openai.com/docs/api-keys.| Package | Purpose | Install |
|---|---|---|
PyMuPDF (fitz) | Read and extract text from PDF files | pip install pymupdf |
| python-dotenv | Load environment variables from a .env file | pip install python-dotenv |
| OpenAI Python client | Call OpenAI APIs for keyword extraction | pip install openai |
1) Load environment variables and imports
Start by loading environment variables and importing required modules. This section sets up dotenv and common utilities.OPENAI_API_KEY and any other secrets out of your repository (use .env or your platform’s secret manager).
2) Define the PDF resume scanning tool
We expose a tool the agent can call: it opens each PDF in the folder, iterates pages and lines, and records matches for any of the provided keywords.- Only
*.pdffiles are scanned. - Matching is case-insensitive and performed per line. This keeps context (the line and page) for every hit.
- Each result contains
filename,keyword,line, and a 1-basedpagenumber.
3) Extract keywords from a job description using OpenAI
Define a function that calls the OpenAI chat completion endpoint to extract a list of the most important skills/technologies from the job description. The function normalizes and cleans numbered or bulleted lists returned by the model.- Keep
temperaturelow (e.g., 0.2–0.4) to improve determinism for extraction tasks. - Consider adjusting the system prompt to include domain-specific terms or required exclusions.
4) Job description and running the agent
Use a job description (replace with one from your interface or user input), extract keywords, then create the agent and run the scan.scan_resumes_for_keywords tool defined earlier.
Runner.runis asynchronous; theasyncio.run(...)wrapper is appropriate for simple scripts.- For long-running or production use, integrate into an async event loop or background worker.
5) Example output
A sample (cleaned) output might look like:Next steps / improvements
- Rank resumes by the number of matched keywords or weighted importance.
- Export results to CSV, JSON, or store in a database for later analysis.
- Add a web UI or email summary for recruiters.
- Improve keyword extraction with domain-specific prompts, stop-words, or normalization (e.g., treat “ML” and “Machine Learning” as synonyms).
- Add fuzzy matching or synonym expansion using libraries like
fuzzywuzzy/rapidfuzzor embedding similarity.
Links and references
- OpenAI Chat Completions guide: https://platform.openai.com/docs/guides/chat
- OpenAI API keys: https://platform.openai.com/docs/api-keys
- PyMuPDF documentation: https://pymupdf.readthedocs.io/
- python-dotenv: https://pypi.org/project/python-dotenv/
- OpenAI Python client: https://github.com/openai/openai-python