On-Device Research Engineer

Hark · San Francisco Bay Area

📍 San Jose💰 $120,000 - $300,000via greenhousePosted 2026-06-25

CareerRiver pulls this listing straight from the employer's hiring system — no recruiter middleman, no reposts. Applying takes you directly to Hark.

About Hark Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory. We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world. To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems. About the Role We are looking for an On-Device Research Engineer to compress large audio and multimodal models into student models that meet the size, latency, and power budgets of our shipping hardware. This role sits between training and production. You will take teacher models from our research pipeline and produce student models that run on DSP, NPU, and microcontroller targets across our product line. You will own distillation, quantization, and architecture-aware compression as a first-class work-stream. Responsibilities Design and execute distillation strategies (response, feature, and self-distillation) to compress teacher models into deployable students Apply quantization (PTQ and QAT), pruning, and architecture search to hit per-product size, latency, and power budgets Build a reusable distillation and compression toolchain that the broader audio ML team can adopt across model families Partner with the broader audio ML team on training pipelines and with the runtime team on deployment targets Define accuracy retention and resource KPIs per product and track them through the release cycle Profile compressed models on target hardware and iterate with DSP and runtime engineers on bottlenecks Requirements 3+ years of professional experience in model compression, distillation, quantization, or efficient deep learning Strong fluency in PyTorch or TensorFlow and modern compression libraries Hands-on experience taking models from full precision to fixed-point or int8 with controlled accuracy loss Comfort working close to hardware and reasoning about compute, memory bandwidth, and power as design constraints Track record of producing models that have shipped to constrained devices Solid foundation in audio or sequence model architectures (CNNs, transformers, RNN-T, conformers) Bonus Qualifications Experience with Hexagon DSP, NPUs, Ambiq class MCUs, or similar Experience with knowledge distillation at scale, including teacher-ensemble or multi-stage distillation Familiarity with neural architecture search and hardware-aware NAS Background shipping voice-first or far-field audio products Contributions to open-source compression toolchains (TFLite, ONNX Runtime, AIMET, and similar) Compensation The US base salary range for this full-time position is between $120,000 - $300,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.

More San Francisco Bay Area jobs

Facility Manager - Residential Valet Operations
Reimagined Parking
Valet Parking Attendant - Hyatt Regency ( Seasonal / Part Time )
Reimagined Parking
Valet Supervisor - Millennium Tower
Reimagined Parking
Hotel Parking Operations Manager- San Jose
Reimagined Parking
Administrative Clerk
TeleSolv Consulting
File Clerk
TeleSolv Consulting

San Francisco Bay Area jobs · Browse all locations