Work with the world's top founders

Search

jobs

My job alerts

Voice AI Systems Engineer

Known

Software Engineering, Data Science

San Francisco, CA, USA

USD 225k-330k / year + Equity

Posted on Feb 8, 2026

Location

San Francisco, CA

Employment Type

Full time

Location Type

On-site

Department

Engineering

Compensation

$225K – $330K • Offers Equity

Known - Founding Voice AI Systems Engineer

San Francisco, CA (In-Person)
225k-330k Cash + Equity

Known is a matchmaker that talks to users and supports them like a friend. Our mission is to empower humanity by applying general intelligence to human connection.

Users join Known by telling us their life story. On average, our new users talk to our AI voice agent for 27 minutes, giving us a uniquely intimate multi-modal data set.

We are a team of engineers who’ve created some of the most widely used AI-driven consumer products including Uber Eats, Uber, Faire and Afterpay.

We love to work hard, with a high degree of autonomy and ownership. We work together in Cow Hollow, San Francisco.

Learn more

About the Role

We’re looking for founding voice AI systems engineers to build and scale Known’s core voice systems architecture, powering our voice-led onboarding and user experiences.

This is a unique opportunity to work with a hyper-personalized data-set, combining voice transcripts, images, and structured user data to empower real-time, personalized AI voice-led conversations at scale. You’ll work directly with Chen Peng, former head of ML at Uber Eats and Faire.

What You’ll Do

You will be responsible for the first impression of a user’s journey on Known. You'll have the autonomy to own:

Low-Latency Orchestration: Architecting the real-time pipeline between STT (Speech-to-Text), LLM reasoning, and TTS (Text-to-Speech) to ensure conversational fluidness (<500ms response times).
Voice Personalization & Memory: Building systems that allow our AI to remember not just what a user said, but how they said it, incorporating tone and sentiment into long-term user profiles.
Audio Intelligence: Implementing and fine-tuning Voice Activity Detection (VAD) and interrupt-handling logic so the AI feels responsive, empathetic, and polite during the onboarding interview.
Streaming Infrastructure: Maintaining robust WebRTC or WebSocket-based systems to handle high-concurrency voice streams while maintaining audio fidelity.
Evals for Voice: Developing custom evaluation frameworks to measure "conversational success," going beyond word error rate (WER) to assess personality, warmth, and engagement.

Requirements

We’re looking for someone who obsesses over the "uncanny valley":

3-5 Years in ML/Systems: Proven experience deploying high-scale models in production, specifically focusing on audio processing or real-time streaming.
The Voice Stack: Deep familiarity with modern STT/TTS frameworks (e.g., ElevenLabs, LiveKit, VITS and Sesame) and audio libraries like Librosa or FFmpeg.
Agentic Conversational AI: Experience building "brain" logic for LLMs using tools like LangGraph or Haystack to manage complex, non-linear dialogue.
Production Hardened: You’ve optimized model inference for speed using TensorRT, ONNX, or Triton, and you’re comfortable in a Docker/Kubernetes/Cloud environment.

Our Investors

We’re backed by Eurie Kim and Kirsten Green at Forerunner Ventures (the investors behind Decagon, Faire, and Oura), NFX and PearVC.

Compensation Range: $225K - $330K

See more open positions at Known

Privacy policy Cookie policy