Job Description: Applied AI Engineer Voice Native OS

Experience: 03 Years

Location: (Location/Remote)

Technical Stack: Python, TypeScript, WebSockets, Playwright/CDP, LLM APIs

System Overview

We are developing a Voice Native OS, a computing environment where natural language is the primary input method for executing complex workflows. The system architecture consists of three distinct technical components that you will build and maintain:

The Audio Pipeline: A real-time, bidirectional streaming layer that handles speech recognition, synthesis, and turn-taking logic.
The Execution Engine: A stateful control loop that translates natural language intent into structured plans and manages the lifecycle of multi-step tasks.
The Runtime Layer: A sandboxed environment where the system interacts with external applications (browsers, file systems, code interpreters) via deterministic APIs.

Key Responsibilities

1. Real-Time Audio Engineering

Streaming Architecture: Design and implement full-duplex audio pipelines using WebSockets or WebRTC. You will manage the buffering and transmission of audio chunks between the client, Speech-to-Text (STT) providers, and Text-to-Speech (TTS) providers.
Latency Optimization: Instrument the audio stack to measure and reduce "time-to-first-byte" and "end-to-end latency." This involves optimizing network requests, parallelizing model inference where possible, and managing stream concurrency.
Interaction Logic: Implement Voice Activity Detection (VAD) and "barge-in" logic. The system must detect when a user is speaking while the system is outputting audio, instantly halt playback, and clear the context buffer to accept new input.

2. Tooling and Runtime Integration

Browser Automation: Build robust control interfaces for web browsers using Playwright or the Chrome DevTools Protocol (CDP). This involves creating functions that allow the model to navigate URLs, interact with DOM elements, extract structured data, and handle dynamic page states.
Deterministic API Design: Define strict JSON schemas for all tools and actions. You will implement input validation (using libraries like Pydantic or Zod) to ensure that tool calls generated by the LLM match the expected types and formats before execution.
Sandboxed Execution: Implement secure environments for code execution and file manipulation. This includes setting up permissions, resource limits, and timeout mechanisms to prevent runaway processes or unsafe operations.

3. State Management and Reliability

Control Loop Implementation: Write the logic for the agent's decision cycle:Observation (parsing user input/system state) Reasoning (determining the next step) Action (executing a tool) Feedback (processing the tool output).
Error Handling: Develop recovery mechanisms for common failure modes, such as network timeouts, malformed tool outputs, or ambiguous user requests. The system must be able to retry actions or request clarification rather than crashing.
Telemetry and Tracing: Integrate distributed tracing to log every step of a session. You will maintain logs that correlate audio input with downstream tool usage and system responses for debugging purposes.

Technical Requirements

Core Competencies

Software Engineering: Proficiency in Python and/or TypeScript, with a focus on writing clean, type-safe, and maintainable code.
Asynchronous Programming: Strong understanding of event loops, async/await patterns, and handling concurrent network requests. This is critical for managing simultaneous audio streams and API calls.
API Integration: Experience consuming and implementing RESTful APIs and WebSocket interfaces. Familiarity with handling authentication, rate limiting, and connection stability.

Domain Experience (Must have one or more)

Real-Time Systems: Experience building applications that process streaming data or handle event-driven architectures.
Browser Automation: Experience with programmatic browser control (Selenium, Playwright, Puppeteer) and strategies for handling DOM interaction.
LLM Orchestration: Experience building applications that utilize Large Language Models for logic or data extraction, specifically involving function calling or tool use.

Scope of Work (First 90 Days)

Build: Implement a functional voice-to-action workflow that takes a spoken command, executes a browser-based task (e.g., "Find a flight to X"), and returns a spoken confirmation.
Test: Establish a regression testing suite consisting of 2050 predefined tasks. This suite will run automatically to verify that code changes do not degrade the success rate of common workflows.
Optimize: Analyze system logs to identify the primary sources of latency and error rates, and implement code fixes to address them.

Application Candidate Signals

We review applications based on technical evidence. Please highlight projects or experience where you have:

Defined and implemented structured schemas for API interactions.
Handled race conditions or synchronization issues in asynchronous systems.
Built automated scripts that interact with third-party software or websites.
Implemented logging and observability stacks for debugging complex logic flows.

Applied AI Engineer

Gödel Machines

Job Description

Services you might be interested in

Improve Your Resume Today