Podflare runs each sandbox in a dedicated microVM with concurrent
snapshot restore, a warm pool, and copy-on-write rootfs clones. The
whole point of that stack is that the latency a customer actually
feels is the same order of magnitude as calling a local function —
not starting a container.
TL;DR — the fastest numbers
| Operation | Time |
|---|
fork() server-side (warm diff snapshot) | 24 ms |
create() pool hit (server-side) | 7–11 ms |
| Pool refill per VM | 12 ms |
Hot run_code() from a laptop (min) | 38 ms |
Full Sandbox() + run_code() + close() from a laptop (min) | 226 ms |
These are the numbers we build toward. Server-side compute is single
digits of milliseconds end-to-end; the rest is network between you and
the region.
~100 ms full round-trip from a fiber uplink. The numbers further
down were measured from a laptop on regular residential wifi, which
adds ~100–150 ms of last-mile jitter on top. From a fiber-uplink
desktop, a VPS, or another cloud region in the same metro, you can
expect the full create + exec + destroy flow to land around
100 ms. Reproduce with scripts/bench.py from wherever your
agent actually runs.
Headline: direct-region hot path
What a long-running agent actually looks like — one Sandbox, many
execs. This is the number that matters most.
sb = Sandbox(host="https://usw1.podflare.ai") # pin to us-west directly
sb.run_code("print('warm')") # warmup
for step in agent_loop:
sb.run_code(step.code) # ← this is the hot path
Hot run_code() — already-live sandbox
| Metric | Direct usw1.podflare.ai |
|---|
| p50 | 73 ms |
| p95 | 82 ms |
| min | 38 ms |
| mean | 70 ms |
A full HTTPS POST /v1/sandboxes/:id/exec with NDJSON streaming back.
Python interpreter, vsock marshalling, the works. 73 ms p50 is what
an LLM agent loop sees per tool call.
Full round-trip — Sandbox() → run_code() → close()
| Metric | Direct usw1.podflare.ai |
|---|
| p50 | 258 ms |
| p95 | 297 ms |
| min | 226 ms |
| mean | 259 ms |
That covers create + exec + destroy + 3 HTTPS round-trips from a US
laptop to us-west (Latitude SJC bare metal). About 50 ms of that is raw
coast-to-coast RTT on the wire; the rest is TLS + HTTPS framing + the
work of the VM itself.
fork() — the 100 ms primitive
Copy-on-write mid-flight snapshot + reflink rootfs. One parent, N
children, each starting from the parent’s exact state.
| Path | Time |
|---|
| Server-side, diff snapshot + spawn (warm) | 24 ms |
| Server-side, first fork from fresh parent (cold diff) | 116 ms |
| End-to-end through SDK from laptop | ~80 ms p50 |
Where the 24 ms goes on the server side:
| Phase | Time |
|---|
| Pause the running VM | 1 ms |
| Capture dirty-page diff snapshot | 4–5 ms |
| Resume the parent | 1 ms |
| Prepare child memory (shared CoW) | ~1 ms |
| Merge diff onto shared base | 2–3 ms |
| Spawn N children in parallel | 12–17 ms |
The first fork from a fresh parent pays ~90 ms extra because the parent
has been running long enough to dirty ~20 MB of pages — those have to
hit disk once. On subsequent forks there’s nothing new to dirty and the
snapshot stays in the single-digit ms range.
Under the hood — internal latencies
These are the hostd-local numbers (loopback, no network). They’re
what the customer-facing numbers above build up from.
| Operation | Latency |
|---|
create() pool hit | 7–11 ms |
| Pool refill per VM | 12 ms |
run_code("print(1)") REPL hot (hostd-local) | 3–5 ms |
upload(256 KB) via vsock | ~5 ms |
merge_into(winner) | ~50 ms |
Pool refill: 12 ms
| Phase | Time |
|---|
| Metadata-only rootfs clone (CoW) | 0–1 ms |
| Spawn the VM process | ~3 ms |
| VM ready | under 2 ms |
| Load snapshot | ~3 ms |
| Agent ready | ~3 ms |
Rootfs clones are metadata-only — no block copy — because of the
filesystem we use on the host. On a default filesystem the same step
takes 700+ ms; that’s the difference between a warm pool that
keeps up and one that doesn’t.
Snapshot-load: 3 ms
Seed memory is shared across every pool VM with page-level copy-on-write,
so kernel page cache is hit most of the time. Only pages the child
actually dirties get copied out.
Which URL should I hit?
Honest answer: for the lowest possible latency, skip the smart
load-balancer and hit the region URL directly.
Podflare exposes two production endpoints and they have real
tradeoffs. We document both; we don’t pretend the smart-routed one is
a free lunch.
| Endpoint | Full round-trip p50 | Hot run_code() p50 | Tradeoffs |
|---|
https://usw1.podflare.ai (direct us-west) | 258 ms | 73 ms | Fastest. You pick the region. No auto-failover. |
https://hel.podflare.ai (direct eu) | similar from EU | similar from EU | Same pattern for Europe. |
https://api.podflare.ai (smart edge) | 459 ms | 58 ms | Cloudflare Worker geo-routes + fails over + enforces per-tier concurrent limits. Adds one extra hop. |
Go direct when…
- Latency is the whole point. Benchmarks, real-time tool calls
inside an agent loop, anything where the extra 200 ms of a Worker
hop is meaningful.
- You already know which region is best for your traffic (most
production deployments end up pinning one anyway).
- You’re willing to handle failover yourself if that region goes down
(catch 5xx, retry the other region URL).
# Direct. No Cloudflare hop.
sb = Sandbox(host="https://usw1.podflare.ai")
Use the edge when…
- You want the simplest code path: one URL, automatic geo-routing,
automatic failover on 5xx, built-in concurrent-limit enforcement.
- Your users are globally distributed and you can’t pin one region.
- The 200 ms overhead doesn’t matter for your workload (batch jobs,
CI runs, anything where the sandbox lives for seconds-to-minutes).
# Smart routed. Easy mode.
sb = Sandbox() # defaults to https://api.podflare.ai
Both are production endpoints. The edge is not required — you can
always skip it if you want raw speed.
One nuance worth knowing: on the hot-exec path (reusing an
already-created sandbox), the edge is slightly faster (58 ms vs 73
ms p50) because Cloudflare’s HTTP/3 stack keeps the connection warmer
than our Caddy origin. The gap closes once you make a handful of
back-to-back execs from the same SDK Client either way. The
round-trip gap (258 vs 459 ms) doesn’t close — that’s the raw cost
of the Worker hop on every new sandbox.
Scaling ceiling (single host)
On our current hardware (AMD EPYC 24-core, 64 GB DDR5):
| Knob | Limit |
|---|
| Concurrent running sandboxes | ~50 comfortably at 1 GB RAM each |
| Warm pool size | 50 fine, 100 eats headroom |
| Burst create rate | ~83 creates/sec (pool-refill bottleneck, parallel) |
fork(n) throughput | limited by disk write of the diff (~KB-MB) |
Scaling ceiling (multi-region)
us-west + eu are live today. The Cloudflare Worker that fronts
api.podflare.ai steers traffic by:
- Explicit
X-Podflare-Region header (customer pin)
- Explicit
/r/REGION/... path prefix (e.g. /r/us-west/v1/sandboxes)
CF-IPContinent geo match + KV-cached capacity
- Automatic failover on 5xx (for create-class requests only)
Add a region → the Worker picks it up the moment DNS resolves; no SDK
upgrade needed. Capacity snapshots refresh every 60 s from the hostd’s
/v1/capacity endpoint.
The journey
Every committed number — from first microVM boot to the current
production build.
| Milestone | create() | fork(n=5) |
|---|
| Cold boot (no pool, no snapshot) | 1,414 ms | — |
| Snapshot-restore research (PoC) | 933 ms | — |
| Warm pool, cold-boot refill | 6 ms | — |
| Snapshot-backed refill, no CoW | 6 ms | — |
| + xfs reflink CoW (refill 12 ms) | 6 ms | 549 ms |
| + diff snapshot for fork | 6 ms | 101 ms |
| + multi-region edge routing | — | — |
| + persistent Spaces (freeze/resume) | — | — |
Roadmap:
- Lazy page loading for seed memory → pool refill approaches sub-5 ms.
- RAM-backed fork snapshots → fork diff writes hit RAM, not disk.
- Seed-tree templates — per-workload pre-imported snapshots. The
python-datasci template already has pandas + numpy + scipy
imported in the REPL before you get the sandbox.
Compared to what?
V8 isolates (Cloudflare Workers) spawn in ~1 ms with ~1 MB memory.
They’re an entirely different product: JS/Wasm only, no native deps,
no filesystem, no Python REPL state. Real agents that run
pip install scikit-learn && model.fit(X) don’t run in isolates.
Container-only platforms without snapshot-restore quote “cold start”
times that are really docker run + runtime init. A Python container
takes ~500 ms–2 s to become usable. Podflare’s warm pool hit is
10–50× faster and the sandbox is already-ready for run_code.
Other microVM-based sandbox providers exist. Where we’re different:
- ~80 ms fork() — Podflare’s fork primitive snapshots a running
sandbox and spawns N children in parallel. Most providers don’t
expose fork at all.
- Persistent Spaces — freeze-to-disk on idle, resume into a fresh
sandbox later. The VM’s running Python process survives.
- Your own edge routing — built-in geo routing + automatic
failover. No load-balancer surcharge, no vendor lock.
Reproduce the numbers
Every number on this page was produced by the bench script in our repo:
git clone https://github.com/PodFlare-ai/podflare
cd podflare
PODFLARE_API_KEY=pk_live_... python3 scripts/bench.py --n 30
Re-run any time you want to audit the marketing. If our numbers regress,
we’d rather you find out than pretend.
Methodology
- Client: macOS laptop on regular residential wifi (west-coast US).
This is not the best case. Last-mile wifi + consumer router
buffering add roughly 100–150 ms of round-trip time that isn’t
there on fiber-uplink desktops or cloud-to-cloud traffic.
- Network path (direct): Laptop → TLS → Caddy on
usw1.podflare.ai
→ hostd (Latitude SJC bare metal).
- Network path (edge): Laptop → Cloudflare edge →
api.podflare.ai
Worker → origin → hostd.
- Sandbox: Stock
default template. 1 GB RAM, 4 GB rootfs, pool-warm.
- SDK:
podflare==0.0.10 (Python) with a single shared httpx
Client — TLS handshake amortized across iterations, matching what
real agent loops do (they don’t open a fresh connection per exec).
- N: 30 end-to-end iterations + 30 hot-exec iterations after
one discarded warmup.
- Date: April 2026.
No selective sampling. No “we excluded network jitter.” The script in
scripts/bench.py runs every iteration back-to-back and reports
statistics.mean + percentiles over the raw samples.
What you’ll see from a fiber uplink or in-cloud
If your agent runs on a VPS, a fiber-uplink desktop, or a nearby cloud
region, coast-to-coast RTT drops from ~50 ms on wifi to ~20 ms on fiber
and ~5 ms inside the same metro. Apply that delta to each HTTPS
round-trip in the numbers above:
- Hot
run_code(): 38 ms on wifi → ~15 ms on fiber → sub-10 ms in-cloud
- Full round-trip: 226 ms on wifi → ~100 ms on fiber → ~30 ms in-cloud
The server-side compute numbers (fork 24 ms, pool hit 7–11 ms, pool
refill 12 ms) don’t change — those are what hostd does regardless of
where the client sits. Network is the variable.