Senior AI Platform Engineer
Expel · Remote
📍 Remote💰 $142,900via greenhousePosted 2026-06-24
Apply on company site ↗
CareerRiver pulls this listing straight from the employer's hiring system — no recruiter middleman, no reposts. Applying takes you directly to Expel.
You believe great ML systems don't just work — they scale, they recover gracefully, and they give data scientists the confidence to iterate quickly. At Expel, you'll be a key contributor in building and maturing the infrastructure that powers our machine learning and generative AI capabilities. From end-to-end training pipelines to the specialized infrastructure behind production agentic applications, your work will directly shape how fast we can innovate and how reliably our AI systems run.
You'll work closely with senior and principal engineers, data scientists, and cross-functional teams to operationalize ML at scale. You bring strong hands-on expertise and a genuine drive to continuously improve the systems and practices around you.
What Expel can do for you
Give you hard, meaningful problems — building the infrastructure that lets defenders win using AI
Connect you with a collaborative team of engineers, data scientists, and researchers who care about doing it right
Offer unlimited PTO (that leadership models and encourages), up to 24 weeks of parental leave, and really excellent health benefits
Pay you a monthly fitness and cell phone stipends — no receipts required
Support your professional growth with a conference benefit and continuous learning opportunities
Offer full remote flexibility — work from wherever you do your best work
What you can do for Expel
Build and scale ML infrastructure
Architect and maintain end-to-end machine learning training pipelines on AWS (SageMaker, EKS, Step Functions) to ensure reliable and reproducible model development and deployment
Build and maintain infrastructure for production agentic applications using Amazon Bedrock and Bedrock AgentCore — including agent runtimes, memory, secure gateways, and observability at scale
Contribute to the architectural evolution of our ML platform, including evaluating MLOps tooling and participating in buy vs. build decisions
Operationalize with rigor
Implement AI/ML governance best practices for model versioning, testing, validation, maintenance, and security
Integrate MLOps best practices with Expel's SDLC, security, and infrastructure standards, working alongside SRE, Platform Engineering, and Security teams
Drive quality, reliability, and scalability improvements through thoughtful engineering and monitoring
Collaborate and enable
Partner with data scientists, software engineers, and stakeholders to operationalize ML models reliably and at scale
Mentor and support junior engineers; foster a culture of engineering excellence
Create and maintain documentation, internal tooling, and enablement resources so practitioners across Expel can work effectively with ML systems
Stay current with the MLOps landscape and bring relevant innovations back to the team
What you should bring with you
Collaboration & communication
Clear communicator — able to write documentation and explain technical concepts to both engineering and non-technical audiences
Strong collaborator with engineers, product managers, and business stakeholders
Demonstrated ability to mentor others and invest in the growth of the people around you
Balances near-term delivery with longer-term technical quality
Technical depth
Strong Python proficiency; familiarity with other languages (Go, JS) is a plus
Solid experience with CI/CD pipelines, infrastructure-as-code, and containerization for ML workloads
Hands-on experience with cloud-based ML platforms — AWS (SageMaker, Bedrock, Bedrock AgentCore) strongly preferred; GCP (Vertex AI) experience also valued
Proven experience operationalizing LLMs and building infrastructure for complex agentic applications — agent orchestration, memory, tool calling, RAG architectures
Familiarity with ML frameworks including Scikit-Learn, PyTorch, Spark, and TensorFlow
Working knowledge of continuous retraining, concept drift monitoring, and data drift detection in production
Education & experience
5+ years of relevant software engineering experience with meaningful focus on ML operations and infrastructure
Degree in Computer Science, Mathematics, Statistics, Engineering, or a related technical field preferred (or a compelling story)
Demonstrated track record of delivering impactful ML infrastructure or MLOps projects
Experience contributing to team practices, standards, or tooling in a collaborative environment
Additional information
The base salary range for this role is between $142,900 USD and $207,200 USD + bonus eligibility and equity.
We believe in paying transparently and equitably. Your salary will ultimately be based on factors such as your experience, skills, team equity, and market data. You’ll also be eligible for unlimited PTO (which we model and encourage), work location flexibility, up to 24 weeks of parental leave, and really excellent health benefits.
We’re only hiring those authorized to work in the United States.
We’re an Equal Opportunity Employer: You’ll receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
#LI-Remote
Salary Range $142,900 — $207,200 USD
More Remote jobs
Remote jobs · Browse all locations