AI Operations Engineering Technical Leader
Milpitas
Saturday, 02 May 2026
Cisco is looking for a highly experienced and innovative ML Operations Engineer to join our global DevOps team. In this critical role, you will drive the production readiness, deployment, and maintenance of scalable machine learning systems. You will work closely with a cross-functional team of data scientists, software development engineers, information security professionals, and DevOps engineers on creating secure and resilient ML pipelines. Key Responsibilities Design, build, and manage robust ML pipelines for training, validation, and deployment Build and maintain scalable infrastructure using Kubeflow for ML experiments and inference in multiple public clouds Implement CI/ CD in GitHub for ML systems ensuring reproducibility and traceability Experience driving the implementation LLM evaluation and observability solutions Advocate automation in every layer of the infrastructure stack using Infrastructure as Code (Ia. C) principles and tools such as Terraform, Helm, and Git. Ops frameworks Monitor models in production for performance degradation, drift, and fairness Participate in on-call rotation for ML Operations Work closely with data scientists, engineers, and product managers to understand requirements and integrate models into applications Minimum Qualifications Bachelor's degree in Comp Science, Engineering (or related field /industry) 8 years of DevOps experience, Masters 6 years of related experience, or PhD 3 years of related experience. Understanding of CI/ CD pipelines and automation tools. Knowledge of cloud platforms, minimally AWS with Azure and GCP as a bonus Proficiency in Python and familiarity with ML libraries (e.g., Scikit-learn, Py. Torch, Tensor. Flow, etc.) Strong understanding of ML lifecycle management and model versioning Preferred Qualifications: Experience deploying large language models (LL - Ms) or generative AI systems Familiarity with feature stores, vector databases, or data observability platforms Excellent communication, collaboration, and mentoring skills. Deep expertise in CI/ CD tooling and practices, including hands-on experience with systems like Jenkins, GitLab, Argo. CD, or similar. Strong proficiency in Kubernetes, Docker, and cloud-native patterns in AWS, Azure, or GCP.