CareerRiver

Senior MLOps & AI Infrastructure Engineer

Altera · San Francisco Bay Area

📍 San Jose, California, United States💰 $149,100 - $215,925via workday
Apply on company site ↗
CareerRiver pulls this listing straight from the employer's hiring system — no recruiter middleman, no reposts. Applying takes you directly to Altera.
Job Details: Job Description: About Altera At Altera™, our independence as the world’s largest pure‑play FPGA solutions provider gives us the focus, speed, and agility to innovate without compromise. With more than four decades of industry‑leading FPGA expertise, our singular mission is to deliver the programmable technologies that help customers differentiate, innovate, and scale across rapidly evolving markets like AI, cloud, networking, and edge. As an independent company, we move faster, invest deeper, and partner more closely—empowering our teams to drive breakthrough innovation and shape the future of the FPGA industry. About the Role We are looking for a Senior MLOps & AI Infrastructure Engineer to architect, build, and operationalize machine learning systems at scale. This role sits at the intersection of data science, software engineering, and infrastructure — combining deep ML expertise with the DevOps/MLOps discipline required to ship models reliably into production. You will partner closely with software, data, and infrastructure teams to design end-to-end ML pipelines, automate model lifecycle management, and deliver AI-powered capabilities across our EDA, HPC, and cloud environments. Key Responsibilities: ML Platform & Pipeline Engineering •    Design, build, and maintain scalable ML pipelines for training, evaluation, and deployment across cloud and on-prem HPC environments •    Build MLOps infrastructure including experiment tracking, model registry, feature stores, and automated retraining workflows •    Implement CI/CD/CT (Continuous Training) pipelines for ML models using tools such as Kubeflow, MLflow, Airflow, or similar •    Containerize ML workloads with Docker and orchestrate at scale using Kubernetes and GPU node pools Model Development & Optimization •    Develop, fine-tune, and deploy large-scale models including LLMs, GNNs, and reinforcement learning agents for EDA and chip design applications •    Apply advanced techniques: transfer learning, quantization, pruning, distillation, and RLHF for production-grade model efficiency •    Implement A/B testing frameworks and shadow deployments for safe model rollout •    Benchmark and optimize model inference performance on GPU/TPU clusters Data Engineering & Feature Management •    Build and maintain data pipelines for large-scale structured and unstructured datasets (terabyte-scale) •    Collaborate with data teams to design feature engineering systems and maintain data quality for ML training •    Implement data versioning and lineage tracking (DVC, Delta Lake, or similar) Infrastructure & Operations •    Manage cloud ML infrastructure on AWS (SageMaker), Azure (AML), or GCP (Vertex AI) with cost and performance optimization •    Automate infrastructure provisioning using Terraform or CloudFormation for GPU-backed ML environments •    Build monitoring, alerting, and observability systems for model performance drift, data quality, and system health •    Support HPC schedulers (LSF, Slurm) for large-scale distributed training jobs Collaboration & Leadership •    Partner with research scientists to productionize experimental models with engineering rigor •    Mentor junior engineers and define ML engineering best practices across the organization •    Drive adoption of AI/ML solutions within semiconductor, EDA, and simulation workflows Technology Stack ML Frameworks: PyTorch • TensorFlow • JAX • Hugging Face • scikit-learn • XGBoost MLOps & Pipelines: MLflow • Kubeflow • Airflow • Weights & Biases • DVC • Feast Infrastructure & Cloud: AWS SageMaker / GCP Vertex AI / Azure ML • Terraform • Docker • Kubernetes • Slurm / LSF Languages: Python • Bash • Go • SQL Monitoring & Observability: Prometheus • Grafana • ELK Stack • Evidently AI • Arize Key Competencies •    Strong ownership mindset — you drive ML initiatives from prototype to production without being asked •    Bias toward automation: if you do it twice, you automate it •    Ability to bridge research and engineering — translating papers into production-grade systems •    Thrives in fast-paced, ambiguous environments typical of deep-tech and semiconductor companies •    Clear communicator who can explain complex ML concepts to non-technical stakeholders Salary Range The pay range below is for Bay Area California only. Actual salary may vary based on a number of factors including job location, job-related knowledge, skills, experiences, trainings, etc. We also offer incentive opportunities that reward employees based on individual and company performance.  $149,100 - $215,925 USD We use artificial intelligence to screen, assess, or select applicants for the position. Applicants must be eligible for any required U.S. export authorizations. Qualifications: Required Qualifications Bachelor’s or Master’s degree in Computer Science, Machine Learning, Statistics, or related field and 10+ years of industry experience 10+ years of experience across ML engineering, data science, and MLOps — including frameworks (PyTorch, TensorFlow, JAX, Hugging Face) and production model deployment at scale 8+ years of experience experience with parallelism strategies (FSDP, DeepSpeed, data/model parallelism) 10+ years of experience and proficiency in Python programming 8+ years of experience in cloud ML platforms (AWS, GCP, Azure), Docker/Kubernetes, and CI/CD pipelines 5+ years of hands-on experience with MLflow, W&B, or Neptune for tracking and reproducibility Preferred Qualifications Phd in Computer Science, Machine Learning, Statistics, or related field Experience applying ML/AI to semiconductor, EDA, or chip design domains (e.g., timing prediction, place & route optimization, DRC closure) Familiarity with HPC schedulers such as LSF or Slurm and GPU cluster management for training workloads Knowledge of LLM fine-tuning, Retrieval-Augmented Gen

More San Francisco Bay Area jobs

San Francisco Bay Area jobs · Browse all locations