CareerRiver

Member of Technical Staff - Compilers

Architect · San Francisco Bay Area

📍 Palo Altovia ashbyPosted 2026-04-16
Apply on company site ↗
CareerRiver pulls this listing straight from the employer's hiring system — no recruiter middleman, no reposts. Applying takes you directly to Architect.
ABOUT US Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel. We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar. WHAT YOU'LL DO As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions. - Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU. - Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks. - Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput. - Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions. - Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end. - Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking. - Grow into a compiler team lead as the team scales. WHAT WE'D LIKE TO SEE Qualifications & Skills: - Degree: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a closely related field. - Experience: 5+ years building compilers or code generation toolchains for custom accelerators. Must have targeted ML/AI hardware compiler experience, as general-purpose (GCC/LLVM for CPUs) is not sufficient. - Domain Background: Hands-on experience on at least one of: Apple Neural Engine compiler, Google XLA / Edge TPU / TPU codegen, Groq TSP compiler (spatial scheduling, IR dialect design), Cerebras compiler stack, Qualcomm Hexagon NN / AI Engine, AMD AIE / Vitis AI, or similar/equivalent custom accelerator compiler(s). - Backend Mechanics: Strong grasp of instruction scheduling, register allocation, and software pipelining — especially for SIMD/VLIW or spatial architectures. - ML Optimizations: Experience with tiling strategies, loop nest optimization, and operator fusion for ML workloads (such as convolution, attention, element-wise ops, reduction, transpositions, etc.). - SW-Managed Memory: Experience with scratchpad type memory allocation, data layout, DMA orchestration, and multi-buffering. - Coding: Strong C++. Python proficiency. Familiarity with MLIR or LLVM infrastructure. - Leadership: Ability to lead and grow the compiler team over time. Bonus: - HW/SW co-design experience: defining ISA features, instruction encodings, or hardware interfaces driven by compiler needs. - IR design for ML accelerators (custom dialects, MLIR-based flows, or graph-level IRs like XLA HLO). - ML framework experience (PyTorch, TensorFlow) and portable graph formats (ONNX). - Experience benchmarking and profiling compiler output on real hardware, FPGA, or cycle-accurate simulators. - Understanding of ML inference systems and workload-level optimizations: FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and prefill/decode scheduling. - Contributions to open-source ML compiler projects (TVM, MLIR, Triton, XLA). - Domain-specific expertise: Track record on energy-efficient, high-performance HW accelerator bring-up. WHAT WE OFFER - Competitive salary and meaningful equity stake - Fast-paced startup with autonomy and visible impact - Cutting-edge challenges at the intersection of AI and silicon design - Direct ownership of the compiler stack as we scale

More San Francisco Bay Area jobs

San Francisco Bay Area jobs · Browse all locations