CLOUDFLARE has released an experimental voice pipeline for the Agents SDK, enabling real-time voice as another talking channel within the same Durable Object-based architecture. The new package, named @cloudflare/voice, provides components such as withVoice, withVoiceInput, and VoiceClient, and ships built-in providers for Workers AI including continuous STT and TTS options like Deepgram Flux, Deepgram Nova 3, and Aura.
According to the article, this approach lets developers build voice-enabled agents that talk over a single WebSocket connection while preserving the existing agent class, SQLite-backed conversation history, and WebSocket model. The pipeline supports both voice and text on the same connection, and includes client hooks for React apps via useVoiceAgent and useVoiceInput, plus a VoiceClient option for non-React usage.
It is designed to be provider-agnostic, with a vision to integrate additional STT and TTS providers, telephony adapters, and transport options as the ecosystem evolves. The post, dated 15 April 2026, also explains how the voice flow meshes with the Agents SDK’s state management and persistence, emphasising latency reductions through server-side execution and built-in streaming.