blog.cloudflare.com 9/25/2025, 2:00:45 PM · via preferred

R2 SQL: a deep dive into our new distributed query engine

R2 SQL: a deep dive into our new distributed query engine

R 2 SQL is a serverless query engine that lets you run SQL queries over petabytes of data stored in Iceberg tables stored in Cloudflare’s R2, delivering results in seconds. The system hinges on a two-phase approach: a Query Planner that uses metadata from the R2 Data Catalog to prune data before reading, and a Query Execution layer that distributes work across Cloudflare’s global network for massively parallel processing.

To speed up planning, the pipeline streams work units as soon as they are discovered, rather than waiting for a full plan, and uses deliberate ordering to prioritise data most likely to appear in the final results. Data is read in a highly selective way via Parquet column pruning and row-group pruning, with DataFusion powering per-partition SQL execution on the workers and Arrow serving as the in‑memory and inter-process data format.

The architecture emphasises minimal I/O and distributed compute, aiming to finish early on queries with ORDER BY and LIMIT clauses while maintaining correctness.

View full article

Article by CyberSIXT