
- Review exercises to reinforce concepts.
- Multiple-choice quizzes for quick knowledge checks.
- Drag-and-drop games to practice system design and troubleshooting steps.
- Hands-on labs that require debugging, instrumentation, and applying SRE practices to a running system.

Before running the app locally, read the repository README for prerequisites and setup steps (Python, Docker, Docker Compose, or other services). Follow the documented instructions to reproduce the lab environment used in the course.
- It models a realistic microservice-style application with synchronous endpoints and asynchronous background processing.
- Labs target operational concerns: performance, reliability, monitoring, alerting, scaling, and incident response.
- Source code + infra examples let you practice development, deployment, and infrastructure-as-code workflows end-to-end.
| Component | Purpose | SRE-focused examples |
|---|---|---|
| Core App | Routes for products, orders, and processing triggers | Functional testing, endpoint-level observability, load-testing |
| Async Processing | Background workers and message broker for long-running tasks | Queueing behavior, worker autoscaling, backpressure handling |
| Persistent Storage | Relational DB for product/order data | Backups, migrations, storage tuning, consistency concerns |
| Observability Stack | Monitoring, logging, and tracing tools integrated with the app | Dashboards, alerting rules, distributed traces for root-cause analysis |


Running Terraform or cloud-deploying the example infrastructure may create billable cloud resources. Review the Terraform README and your cloud provider’s free-tier or cost controls before applying changes.
- Creating SLIs and SLOs and calculating error budgets
- Building observability dashboards and defining alerts
- Provisioning infrastructure using Terraform (IaC)
- Designing and testing reliable release pipelines
- Performing incident investigation using logs and distributed traces

- Clone the repositories and follow the README setup to start the labs.
- Begin with basic observability labs (metrics, logging, tracing) before moving to scaling and incident simulation exercises.
- Use the app and infra code as a sandbox to experiment with SRE practices in a controlled environment.
- KodeKloud Record Store app repo: https://github.com/jakepage91/kodekloud-records-store-web-app
- KodeKloud Terraform infra repo: https://github.com/jakepage91/kodekloud-records-terraform-infrastructure