Sr Staff Engineer- Availability and Incident Managment

Austin

Friday, 29 May 2026

Lead the strategy and execution for incident retrospective and correction of error (COE) processes across the engineering organization. Help conduct deep technical root cause analysis and incident forensics across distributed systems using observability data, logs, metrics, and traces. Establish continuous improvement loops through automated trend analysis, pattern recognition algorithms, and predictive analytics. Design, code, and deploy automation platforms and self-service tools using Python, Go, Java, or C# that scale incident retrospective workflows and eliminate manual tracking. Build production-grade data pipelines, analytics systems, and real-time dashboards to measure incident trends, COE effectiveness, and action item completion rates. Write code for workflow automation, integrations with observability platforms, and APIs that connect incident management tools across the engineering ecosystem. Leverage SQL and NoSQL databases to store, query, and analyze incident data at scale using Azure tools and cloud-native services. Develop and maintain systems that ensure rigorous follow-through on action items, remediation plans, and preventive measures with automated tracking. Partner with service engineering teams to implement preventive measures and architectural improvements based on incident patterns. Present data-driven insights and incident trend analysis to leadership and engineering teams to drive preventive action. Influence and educate leadership on incident patterns, prevention strategies, and reliability best practices. Mentor engineers on coding best practices, automation techniques, and strengthen technical expertise across the engineering community. Stay current with industry advances in SRE, observability, incident management, and automation; educate teams on emerging practices. Qualifications Experience building automation platforms and self-service tools for workflow management, analytics, or engineering productivity. Fluency in at least two modern languages such as Python, Go, Java, C , or C# including object-oriented design. Experience building microservices architectures, REST APIs, and distributed systems. Experience with data pipelines, analytics platforms, and visualization tools for operational metrics and KPIs. Experience with SQL and NoSQL databases (e.g., Postgre. SQL, MongoDB, Cassandra, Cosmos. DB) for data storage and analytics. Experience with observability platforms (Prometheus, Grafana, Datadog, Splunk, ELK) and distributed systems monitoring, logging, and tracing. Experience with cloud providers (Azure, AWS, or GCP) and cloud-native architectures. Experience with CI/ CD pipelines, infrastructure as code, and container orchestration (Kubernetes, Docker)Experience writing workflow automation code (YAML pipelines, GitHub Actions, Azure DevOps pipelines)Strong understanding of distributed systems architecture, design patterns, reliability, and scaling. Knowledge of retrospective facilitation, continuous improvement processes, and blameless culture principles. Strong architecture and design skills with ability to influence engineering direction and technical roadmap. Experience solving complex analytical problems with data-driven approaches. Proven ability to partner with cross-functional engineering teams and drive systemic improvements. Excellent communication skills with ability to present technical insights to leadership and influence decision-making. Experience leveraging Gen. AI or LL - Ms is a plus?? Experience ??10 years of professional platform development or general development experience?8 years of experience with architecture and design 6 years of experience in open-source frameworks?4 years of experience with AWS, GCP, Azure, or another cloud service Education Bachelor’s degree in Computer Science, Information Systems, or equivalent education or work experience #LI-RM 2 Annual Salary$110,000.00 - $260,000.00

apply
 
Loading Similar Jobs...
JOBZ is an independent Job Search Engine. JOBZ is not an agent or representative and is not endorsed, sponsored or affiliated with any employer. JOBZ uses proprietary technology to keep the availability and accuracy of its job listings and their details. All trademarks, service marks, logos, domain names, job descriptions and other company descriptions / details are the property of their respective holder. JOBZ does not have its users apply for a job on the J-O-B-Z.com website. Additionally, JOBZ may provide a list of third-party job listings that may not be affiliated with any employer. Please make sure you understand and agree to the website's Terms & Conditions and Privacy Policies you are applying on as they may differ from ours and are not in our control.