Staff Backend Engineer - Application Core Services, Stacks | Canada | Remote
Grafana Labs · Remote
📍 Canada (Remote)via greenhousePosted 2026-05-07
Apply on company site ↗
CareerRiver pulls this listing straight from the employer's hiring system — no recruiter middleman, no reposts. Applying takes you directly to Grafana Labs.
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. Grafana Cloud, our fully managed observability platform, is flexible and built for scale. With Grafana Cloud's actually useful AI, organizations can see, understand, and act on all their disparate data to move at the speed of their ambitions. Today, more than 35 million users and 7,000+ customers – including Anthropic, Bloomberg, NVIDIA, Microsoft, and Salesforce – trust Grafana Labs to ensure reliability of their applications and systems, resolve incidents quickly, and optimize their telemetry to reduce noise and cost. We are a 100% remote company with 1,600+ team members across 40+ countries, and we’re backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P. Morgan, CapitalG, and Lead Edge Capital. Learn more at grafana.com and follow us on LinkedIn and X .
We’re scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.
You may not meet every requirement, and that’s okay. If this role excites you, we’d love you to raise your hand for what could be a truly career-defining opportunity.
This is a remote opportunity and we would be interested in applicants located in Canadian time zones (EST + CST only at this time).
Staff Backend Engineer - Application Core Services, Stacks
The Opportunity:
Application Core Services (AppCore) is a group within Platform, in the Foundations department. Foundations produces the Internal Engineering Platform (IEP) and partners closely with our Cloud, Enterprise, and Grafana teams. Our team develops the essential systems driving Grafana's business operations. We utilize the grafana.com platform to engineer bespoke integrations and solutions that unify the diverse technical ecosystem of a modern software enterprise.
The team owns important domain areas that help keep both our customer workflows and internal business processes running smoothly. AppCore is made up of multiple squads, each focused on one or more of these domains. Our work includes maintaining the billing engine responsible for customer usage calculation, automating provisioning after a customer signs a contract, integrating with cloud marketplaces such as AWS, Azure, and GCP, and building and maintaining the user portal our customers rely on to manage their accounts.
This is a team working at the intersection of product, platform, and business operations. The systems we build are critical to how Grafana scales. We are looking for engineers who enjoy solving complex workflow and systems problems, improving reliability and developer experience, and building software that directly supports both customers and internal stakeholders.
As a company we are remote-first and global, we embrace people of different experiences and backgrounds to build diverse teams where every person brings a unique perspective to the software. Engineers at Grafana also have the opportunity to contribute to Open Source communities and collaborate across teams beyond their immediate scope.
What You’ll Be Doing:
The AppCore Stacks squad owns the systems that create, configure, reconcile, migrate, and operate Grafana Cloud stacks at scale. A stack is the customer-facing Grafana Cloud environment that connects an organization to Grafana and the backend services it uses, including Mimir, Loki, Tempo, plugins, dashboards, data sources, and stack-level configuration.
Our work sits at the intersection of product, platform, and operations. We build the control-plane services and workflows that keep stack state aligned across grafana.com, Stack State Service (SSS), Hosted Grafana, cloud regions, and the underlying Grafana Cloud infrastructure. When this domain works well, customers get reliable stack creation, safe configuration rollout, predictable migrations, and fewer manual operational interventions.
Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions
Improve incident response and recovery paths for stack misalignment, reconciliation failures, plugin rollout issues, and Hosted Grafana integration failures
Partner with Product, Hosted Grafana, Infrastructure, Support, and adjacent AppCore squads on customer-impacting stack lifecycle work
Contribute to roadmap planning, technical design, OnCall improvements, and long-term simplification of stack operations
You will help own the production behavior of the systems you build. That includes improving runbooks, dashboards, alerts, reconciliation safety, rollout controls, and recovery procedures. You should be comfortable debugging across service boundaries and making careful changes in systems that affect customer stacks
Of course, there is an on-call component to this role and one that we take seriously. As a company, we hire globally (remote-first) to ensure our on-call remains healthy and aligned to approximately 12 daylight hours per day. You will work closely with
More Remote jobs
Remote jobs · Browse all locations