Audio Software Engineer
Hark · San Francisco Bay Area
📍 San Jose💰 $170,000–$400,000via greenhousePosted 2026-06-17
Apply on company site ↗
CareerRiver pulls this listing straight from the employer's hiring system — no recruiter middleman, no reposts. Applying takes you directly to Hark.
About Hark
Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory.
We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world.
To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems.
About the Role
We're hiring a Member of Technical Staff (Real-Time Audio) to join our Product Engineering team. Hark’s voice agent holds real-time, full-duplex conversations with people in homes, cars, and noisy rooms. That experience is only as good as the audio underneath it.
This role owns the real-time audio that makes conversations feel natural (echo cancellation, noise suppression, and voice activity detection) as production code in our live client. This is not a research role and not a DSP theory role. We're looking for someone who can do both: understand the signal processing and ship the code.
Responsibilities
Own audio quality on the client: echo, self-interruption, dropouts, and clipping
Build and tune the browser audio pipeline with the Web Audio API, AudioWorklet, and getUserMedia constraints
Work the WebRTC audio path end to end: AEC, noise suppression, and VAD
Ship DSP to the client as C++/Rust compiled to WebAssembly, and as TypeScript in the audio pipeline
Tune endpointing, interruption, and turn-taking so the agent listens like a person
Reduce conversational latency and artifacts across the streaming pipeline
Work in our React/TypeScript client where audio meets the UI
Manage features end-to-end from prototyping through production
Collaborate with designers, platform engineers, and our speech team.
Requirements
5+ years of software engineering experience
Shipped real-time audio into a product used by real users
Hands-on experience with WebRTC, AEC (echo cancellation), noise suppression, and VAD
Strong DSP fundamentals: adaptive filtering, STFT, resampling, and gain control
C/C++ or Rust for production DSP, and experience shipping it to the browser via WebAssembly
Working knowledge of the browser audio stack: Web Audio API, AudioWorklet, and MediaStream constraints
Comfort with latency, buffering, and sample rates in a streaming audio pipeline
Owns features end-to-end and works comfortably in a shared production codebase.
Bonus Qualifications
Experience working at a voice, speech, or video-conferencing company
ML for audio: noise suppression, VAD, or source separation (e.g. RNNoise, DeepFilterNet, Silero VAD), and on-device inference (ONNX Runtime, Core ML)
Familiarity with WebRTC internals (the Audio Processing Module, AEC3, Opus) and voice-agent frameworks (LiveKit, Pipecat)
TypeScript and React, and comfort working across the product frontend
Experience with target-speaker isolation, diarization, or barge-in and turn-detection systems for conversational AI.
Compensation
The US base salary range for this full-time position is between $170,000–$400,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
More San Francisco Bay Area jobs
San Francisco Bay Area jobs · Browse all locations