Staff Cyber Site Reliability Engineer (SRE)
Seattle
Wednesday, 03 June 2026
Own Reliability Engineering: Define and drive reliability standards for cybersecurity platforms — establishing SL - Is, SL - Os, and error budgets; identifying systemic weaknesses; and engineering solutions that improve uptime, latency, and fault tolerance. Write Code and Build Automation: Develop production-quality software in Python (required) and Golang (preferred) to automate operational workflows, build internal tooling, eliminate toil, and improve the day-to-day velocity of security engineering teams. Partner with Developers and Infrastructure Engineers: Work closely with software engineers and infrastructure teams to review system designs for reliability, provide feedback on deployability and operability, and ensure that what gets built can be confidently operated and maintained in production. Drive Observability: Instrument security platforms and pipelines with meaningful metrics, logs, and traces; build dashboards and alerting that give the team real operational visibility using tools like Grafana, Prometheus, and similar observability stacks. Lead Incident Response and Post-Mortems: Be a first-responder for production issues affecting security systems; drive structured incident response, coordinate resolution, and produce blameless post-mortems with actionable follow-through to prevent recurrence. Build and Maintain CI/ CD & Infrastructure as Code: Develop and own deployment pipelines (GitHub Actions, Jenkins) and infrastructure automation (Terraform, Ansible) that enable safe, repeatable, and fast delivery of security platform changes. Improve Security Platform Performance: Profile, benchmark, and tune security services, detection pipelines, and data ingestion workflows — identifying bottlenecks and shipping targeted improvements that matter. Contribute Actively in Agile: Be a high-output contributor in a fast-moving agile squad: write code every sprint, engage in design and architecture reviews, participate in code reviews, and help the team maintain quality and momentum. Apply Object-Oriented Engineering Fundamentals: Write clean, testable, and maintainable code using strong OOP principles and SOLID patterns — because operability starts with code quality. Explore AI/ ML & LL - Ms (Plus): Apply knowledge of AI/ ML development, large language models, or generative AI to identify practical opportunities in anomaly detection, alert triage automation, or operational intelligence. Share Knowledge: Contribute to technical discussions, participate in code reviews, and share operational insights with developers and infrastructure partners — not as a formal mandate, but as a natural part of working on a great engineering team. Qualifications. Python Expertise (Required): Demonstrated production-level Python development — used for automation, tooling, and operational software. This is a non-negotiable requirement for consideration. Golang Proficiency (Preferred): Hands-on Golang experience, especially in systems tooling, infrastructure software, or performance-sensitive services. SRE / Platform Engineering Foundation: Proven background in site reliability engineering, platform engineering, or DevOps with a strong software development component — not purely operations. Object-Oriented Design: Applied knowledge of OOP design patterns and SOLID principles demonstrated through production code and tooling. Observability & Monitoring: Hands-on experience with Grafana, Prometheus, or equivalent; able to design meaningful SL - Is/ SL - Os, build useful dashboards, and write alerts that reduce noise rather than add to it. Incident Response: Experience leading structured incident response, conducting blameless post-mortems, and driving systemic follow-through on reliability improvements. CI/ CD & Infrastructure as Code: Proficiency with CI/ CD pipelines (GitHub Actions, Jenkins) and Ia. C tooling (Terraform, Ansible); experience enabling fast, safe, and repeatable deployments. Cloud Proficiency: Hands-on experience with AWS, Azure, or GCP; familiarity with cloud-native reliability and infrastructure patterns. Agile Team Contributor: Comfortable delivering consistently within a high-velocity agile team; strong bias toward iterative delivery and fast feedback. Security Domain Familiarity (Preferred): Exposure to security platforms, SIE - Ms, ED - Rs, detection pipelines, or vulnerability management tooling; Dev. Sec. Ops experience is a strong plus. AI/ ML & LLM Experience (Plus): Working knowledge of AI/ ML development or applied experience with LL - Ms and generative AI, particularly for operational intelligence or anomaly detection use cases. Communication: Able to communicate clearly with both developers and infrastructure engineers; bridges technical disciplines without jargon overload. Experience 8 years of professional engineering experience spanning software development and site reliability / platform engineering . years in SRE, DevOps, or platform engineering roles with a strong software development component . years working in cloud-native environments (AWS, Azure, or GCP).3 years delivering within agile teams in a high-velocity environment. Production Python development is required; Golang experience is a strong differentiator. Experience with AI/ ML development, LL - Ms, or generative AI tooling is a meaningful plus. Cybersecurity platform experience, security engineering, or Dev. Sec. Ops background is a plus. Experience working with audit or compliance teams is a plus. Education. Bachelor's degree in Computer Science, Software Engineering, Cybersecurity, or a related field (or equivalent practical Annual Salary$110,000.00 - $230,000.00