Voice AI Systems Engineer
Known
Location
San Francisco, CA
Employment Type
Full time
Location Type
On-site
Department
Engineering
Compensation
- $225K – $330K • Offers Equity
Known - Founding Voice AI Systems Engineer
San Francisco, CA (In-Person)
225k-330k Cash + Equity
Known is a matchmaker that talks to users and supports them like a friend. Our mission is to empower humanity by applying general intelligence to human connection.
Users join Known by telling us their life story. On average, our new users talk to our AI voice agent for 27 minutes, giving us a uniquely intimate multi-modal data set.
We are a team of engineers who’ve created some of the most widely used AI-driven consumer products including Uber Eats, Uber, Faire and Afterpay.
We love to work hard, with a high degree of autonomy and ownership. We work together in Cow Hollow, San Francisco.
Learn more
About the Role
We’re looking for founding voice AI systems engineers to build and scale Known’s core voice systems architecture, powering our voice-led onboarding and user experiences.
This is a unique opportunity to work with a hyper-personalized data-set, combining voice transcripts, images, and structured user data to empower real-time, personalized AI voice-led conversations at scale. You’ll work directly with Chen Peng, former head of ML at Uber Eats and Faire.
What You’ll Do
You will be responsible for the first impression of a user’s journey on Known. You'll have the autonomy to own:
Low-Latency Orchestration: Architecting the real-time pipeline between STT (Speech-to-Text), LLM reasoning, and TTS (Text-to-Speech) to ensure conversational fluidness (<500ms response times).
Voice Personalization & Memory: Building systems that allow our AI to remember not just what a user said, but how they said it, incorporating tone and sentiment into long-term user profiles.
Audio Intelligence: Implementing and fine-tuning Voice Activity Detection (VAD) and interrupt-handling logic so the AI feels responsive, empathetic, and polite during the onboarding interview.
Streaming Infrastructure: Maintaining robust WebRTC or WebSocket-based systems to handle high-concurrency voice streams while maintaining audio fidelity.
-
Evals for Voice: Developing custom evaluation frameworks to measure "conversational success," going beyond word error rate (WER) to assess personality, warmth, and engagement.
Requirements
We’re looking for someone who obsesses over the "uncanny valley":
3-5 Years in ML/Systems: Proven experience deploying high-scale models in production, specifically focusing on audio processing or real-time streaming.
The Voice Stack: Deep familiarity with modern STT/TTS frameworks (e.g., ElevenLabs, LiveKit, VITS and Sesame) and audio libraries like Librosa or FFmpeg.
Agentic Conversational AI: Experience building "brain" logic for LLMs using tools like LangGraph or Haystack to manage complex, non-linear dialogue.
Production Hardened: You’ve optimized model inference for speed using TensorRT, ONNX, or Triton, and you’re comfortable in a Docker/Kubernetes/Cloud environment.
Our Investors
We’re backed by Eurie Kim and Kirsten Green at Forerunner Ventures (the investors behind Decagon, Faire, and Oura), NFX and PearVC.
Compensation Range: $225K - $330K