Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

CLOUDFLARE announces that Workers AI now runs large models, starting with Moonshot AI’s Kimi K2.5, bringing frontier open-source models to its AI inference platform as of 19 March 2026. The Kimi K2.5 model features a 256k context window and supports multi-turn tool calling, vision inputs and structured outputs, enabling a wide range of agentic tasks within a single platform.

In internal tests, an agent that performs security reviews of Cloudflare’s codebases processes over 7B tokens per day and has identified more than 15 confirmed issues in a single codebase, with an estimated annual cost of $2.4M on a mid-tier proprietary model compared to a fraction of that when using Kimi K2.5, representing a 77% cost reduction.

The article also describes platform improvements including prefix caching with surfaced cached tokens, a new session-affinity header to improve cache hits, and redesigned asynchronous APIs to handle higher volumes and avoid capacity errors. It notes that these advances aim to make serverless inference more reliable and affordable for organisations adopting open-source models for agentic workloads.