How to write a Site Reliability Engineer or Platform Engineer resume that demonstrates reliability ownership, SLO design, and infrastructure at scale.
Check which term the company uses and mirror it.
"Owned reliability for 12 production services across 3 AWS regions -- maintained 99.98% uptime SLO with P99 latency < 120ms; on-call rotation with avg MTTR of 14 minutes."
"Eliminated 6 recurring toil items through automation (Python and Kubernetes operators) -- reduced on-call burden by 8 hours/week per engineer."
"Led post-mortem program across 40-engineer engineering org -- introduced blameless RCA template, published 24 post-mortems in 12 months, and tracked remediation closure rate to 89%."
"Built internal developer platform on Kubernetes -- reduced new service time-to-deploy from 2 weeks to 4 hours; adopted by 45 engineering teams."
"Designed and maintained CI/CD pipelines (GitHub Actions and ArgoCD) serving 180+ repositories -- achieved 99.6% pipeline reliability and reduced build time by 40% through caching."
`
Infrastructure: Kubernetes, Helm, Terraform, AWS/GCP/Azure
Observability: Prometheus, Grafana, Datadog, Jaeger, OpenTelemetry
CI/CD: GitHub Actions, ArgoCD, Flux, Jenkins, Spinnaker
Languages: Go, Python, Bash
Reliability: SLOs, SLAs, error budgets, incident response, chaos engineering
`
site reliability engineering, SLO, SLA, error budget, incident response, Kubernetes, Terraform, observability, Prometheus, Grafana, platform engineering, CI/CD, GitOps, chaos engineering, MTTR, availability.
Ready to apply what you've learned?
Build your resume with AI-powered suggestions and real-time ATS scoring.
Create Your Resume - Free