Senior Staff Machine Learning Engineer, Data & Eval

San Francisco

Friday, 24 April 2026

Define evaluation strategy and success metrics for Gen. AI systems, aligning offline evaluation with online business and customer experience outcomes. Build and scale evaluation frameworks (golden sets, synthetic data, automated regressions, rubric-based grading, LLM-as-judge where appropriate) with strong controls for bias, drift, and reliability. Design the data flywheel: instrumentation, feedback collection, data quality checks, labeling strategy, dataset versioning, and governance to support continuous improvement. Lead cross-functional quality initiatives across product, ops, and engineering, driving clarity on what “good” looks like and how teams act on evaluation results. Develop and productionize pipelines for dataset creation, model monitoring, evaluation-at-scale, and continuous testing (pre-deploy and post-deploy). Drive technical decisions and architecture for evaluation and data infrastructure, balancing speed, rigor, cost, and safety. Minimum Qualifications: Educational Background: PhD in Computer Science, Mathematics, Statistics, or related technical field (or equivalent practical experience). Industry Experience: 10 years building, testing, and shipping ML/ AI systems end-to-end; including 2 years of experience with Gen. AI/ LLM systems in production. Leadership Experience: 5 years leading large, ambiguous technical initiatives as a senior IC, influencing roadmap and engineering/science direction across teams. Technical Proficiency: Deep expertise in evaluation methodology (offline/online alignment, metric design, human-in-the-loop evaluation, A/ B testing, power analysis, regression testing). Hands-on experience with Gen. AI systems, including orchestration, retrieval, tool calling, memory, etc. Experience building data pipelines and quality systems (labeling workflows, dataset curation, versioning, monitoring, and governance). Solid ML fundamentals and best practices (model selection, training/serving, monitoring, reliability, and model lifecycle management). Preferred Qualifications: Customer Support Systems: Experience applying ML/ AI to customer support workflows (e.g., agent assist, classification/routing, resolution recommendation, QA). Infrastructure & Quality at Scale: Experience building robust evaluation platforms for agent behavior validation, safety/guardrails, and continuous improvement. Agile Practice for Applied AI: Proven ability to take evaluation and data flywheel work from incubation to production, iterating quickly while maintaining scientific rigor. Your Location: This position is US - Remote Eligible. The role may include occasional work at an Airbnb office or attendance at offsites, as agreed to with your manager. While the position is Remote Eligible, you must live in a state where Airbnb, Inc. has a registered entity. Click here for the up-to-date list of excluded states. This list is continuously evolving, so please check back with us if the state you live in is on the exclusion list. If your position is employed by another Airbnb entity, your recruiter will inform you what states you are eligible to work from. How We'll Take Care of You: Our job titles may span more than one career level. The actual base pay is dependent upon many factors, such as: training, transferable skills, work experience, business needs and market demands. The base pay range is subject to change and may be modified in the future. This role may also be eligible for bonus, equity, benefits, and Employee Travel Credits. Pay Range$244,000 - $305,000 USD

apply
 
Loading Similar Jobs...
JOBZ is an independent Job Search Engine. JOBZ is not an agent or representative and is not endorsed, sponsored or affiliated with any employer. JOBZ uses proprietary technology to keep the availability and accuracy of its job listings and their details. All trademarks, service marks, logos, domain names, job descriptions and other company descriptions / details are the property of their respective holder. JOBZ does not have its users apply for a job on the J-O-B-Z.com website. Additionally, JOBZ may provide a list of third-party job listings that may not be affiliated with any employer. Please make sure you understand and agree to the website's Terms & Conditions and Privacy Policies you are applying on as they may differ from ours and are not in our control.