Staff Infrastructure Reliability Engineer - Database & Storage

Detroit

Saturday, 02 May 2026

You will help lead the database and storage strategy at Redfin, including architecture, management, and access patterns. You will lead complex technical discussions with a variety of audiences, including software and systems engineers and business leaders. You will architect & lead implementation of cloud database and storage systems with a focus on reliability, observability, scalability, and security. You will support large scale / high volume databases both as self-managed and specialized AWS managed offerings, including management activities, such as upgrade, backup, recovery, and migration. You will use and evangelize approved AI code generation tools to document, architect, and create code. You will plan and participate in high availability and disaster recovery planning/drills. You will lead incident resolution, including performing root causes analyses. You will use your systems knowledge to promote scaling and performance for services across Redfin and some partner companies. You will participate in an on-call rotation for about one week per month. About You 7 years of experience managing systems in AWS or a similar cloud environment, including compute and storage with an emphasis on solution development and execution. 5 years of experience with at least one, but preferably more, of the following: Postgre. SQL or similar RDBMS; AWS Aurora/ RDS; AWS S 3; Elasticache; Opensearch; Dynamo. DB. You have a proven history in architecting, building, scaling, and supporting cloud infrastructure technologies, specializing in database and storage services and can communicate the direct business impact of this work. You have extensive experience with Linux administration and Linux scripting, including Python script development. You are an experienced mentor of other engineers with the ability to guide a team of engineers to identify and implement solutions to difficult problems. You’re committed to best practices that set your team up for long-term success, including infrastructure as code, configuration management tooling, and security practices. Deep knowledge and professional use of at least one AI code generation tool, such as Anthropic Claude Code, GitHub Co. Pilot, Cursor, or similar to implement key efficiencies for cloud infrastructure. You have excellent communication skills that allow you to connect and influence your immediate team up through senior leadership. You understand and can implement core reliability principles, including monitoring, alerting, and incident management. What you’ll get.

apply
 
Loading Similar Jobs...
JOBZ is an independent Job Search Engine. JOBZ is not an agent or representative and is not endorsed, sponsored or affiliated with any employer. JOBZ uses proprietary technology to keep the availability and accuracy of its job listings and their details. All trademarks, service marks, logos, domain names, job descriptions and other company descriptions / details are the property of their respective holder. JOBZ does not have its users apply for a job on the J-O-B-Z.com website. Additionally, JOBZ may provide a list of third-party job listings that may not be affiliated with any employer. Please make sure you understand and agree to the website's Terms & Conditions and Privacy Policies you are applying on as they may differ from ours and are not in our control.