Skip to main content
Podflare runs each sandbox in a dedicated microVM with concurrent snapshot restore, a warm pool, and copy-on-write rootfs clones. The whole point of that stack is that the latency a customer actually feels is the same order of magnitude as calling a local function — not starting a container.

TL;DR — the fastest numbers

OperationTime
fork() server-side (warm diff snapshot)24 ms
create() pool hit (server-side)7–11 ms
Pool refill per VM12 ms
Hot run_code() from a laptop (min)38 ms
Full Sandbox() + run_code() + close() from a laptop (min)226 ms
These are the numbers we build toward. Server-side compute is single digits of milliseconds end-to-end; the rest is network between you and the region.
~100 ms full round-trip from a fiber uplink. The numbers further down were measured from a laptop on regular residential wifi, which adds ~100–150 ms of last-mile jitter on top. From a fiber-uplink desktop, a VPS, or another cloud region in the same metro, you can expect the full create + exec + destroy flow to land around 100 ms. Reproduce with scripts/bench.py from wherever your agent actually runs.

Headline: direct-region hot path

What a long-running agent actually looks like — one Sandbox, many execs. This is the number that matters most.
sb = Sandbox(host="https://usw1.podflare.ai")   # pin to us-west directly
sb.run_code("print('warm')")                    # warmup
for step in agent_loop:
    sb.run_code(step.code)                      # ← this is the hot path

Hot run_code() — already-live sandbox

MetricDirect usw1.podflare.ai
p5073 ms
p9582 ms
min38 ms
mean70 ms
A full HTTPS POST /v1/sandboxes/:id/exec with NDJSON streaming back. Python interpreter, vsock marshalling, the works. 73 ms p50 is what an LLM agent loop sees per tool call.

Full round-trip — Sandbox() → run_code() → close()

MetricDirect usw1.podflare.ai
p50258 ms
p95297 ms
min226 ms
mean259 ms
That covers create + exec + destroy + 3 HTTPS round-trips from a US laptop to us-west (Latitude SJC bare metal). About 50 ms of that is raw coast-to-coast RTT on the wire; the rest is TLS + HTTPS framing + the work of the VM itself.

fork() — the 100 ms primitive

Copy-on-write mid-flight snapshot + reflink rootfs. One parent, N children, each starting from the parent’s exact state.
PathTime
Server-side, diff snapshot + spawn (warm)24 ms
Server-side, first fork from fresh parent (cold diff)116 ms
End-to-end through SDK from laptop~80 ms p50
Where the 24 ms goes on the server side:
PhaseTime
Pause the running VM1 ms
Capture dirty-page diff snapshot4–5 ms
Resume the parent1 ms
Prepare child memory (shared CoW)~1 ms
Merge diff onto shared base2–3 ms
Spawn N children in parallel12–17 ms
The first fork from a fresh parent pays ~90 ms extra because the parent has been running long enough to dirty ~20 MB of pages — those have to hit disk once. On subsequent forks there’s nothing new to dirty and the snapshot stays in the single-digit ms range.

Under the hood — internal latencies

These are the hostd-local numbers (loopback, no network). They’re what the customer-facing numbers above build up from.
OperationLatency
create() pool hit7–11 ms
Pool refill per VM12 ms
run_code("print(1)") REPL hot (hostd-local)3–5 ms
upload(256 KB) via vsock~5 ms
merge_into(winner)~50 ms

Pool refill: 12 ms

PhaseTime
Metadata-only rootfs clone (CoW)0–1 ms
Spawn the VM process~3 ms
VM readyunder 2 ms
Load snapshot~3 ms
Agent ready~3 ms
Rootfs clones are metadata-only — no block copy — because of the filesystem we use on the host. On a default filesystem the same step takes 700+ ms; that’s the difference between a warm pool that keeps up and one that doesn’t.

Snapshot-load: 3 ms

Seed memory is shared across every pool VM with page-level copy-on-write, so kernel page cache is hit most of the time. Only pages the child actually dirties get copied out.

Which URL should I hit?

Honest answer: for the lowest possible latency, skip the smart load-balancer and hit the region URL directly. Podflare exposes two production endpoints and they have real tradeoffs. We document both; we don’t pretend the smart-routed one is a free lunch.
EndpointFull round-trip p50Hot run_code() p50Tradeoffs
https://usw1.podflare.ai (direct us-west)258 ms73 msFastest. You pick the region. No auto-failover.
https://hel.podflare.ai (direct eu)similar from EUsimilar from EUSame pattern for Europe.
https://api.podflare.ai (smart edge)459 ms58 msCloudflare Worker geo-routes + fails over + enforces per-tier concurrent limits. Adds one extra hop.

Go direct when…

  • Latency is the whole point. Benchmarks, real-time tool calls inside an agent loop, anything where the extra 200 ms of a Worker hop is meaningful.
  • You already know which region is best for your traffic (most production deployments end up pinning one anyway).
  • You’re willing to handle failover yourself if that region goes down (catch 5xx, retry the other region URL).
# Direct. No Cloudflare hop.
sb = Sandbox(host="https://usw1.podflare.ai")

Use the edge when…

  • You want the simplest code path: one URL, automatic geo-routing, automatic failover on 5xx, built-in concurrent-limit enforcement.
  • Your users are globally distributed and you can’t pin one region.
  • The 200 ms overhead doesn’t matter for your workload (batch jobs, CI runs, anything where the sandbox lives for seconds-to-minutes).
# Smart routed. Easy mode.
sb = Sandbox()   # defaults to https://api.podflare.ai
Both are production endpoints. The edge is not required — you can always skip it if you want raw speed. One nuance worth knowing: on the hot-exec path (reusing an already-created sandbox), the edge is slightly faster (58 ms vs 73 ms p50) because Cloudflare’s HTTP/3 stack keeps the connection warmer than our Caddy origin. The gap closes once you make a handful of back-to-back execs from the same SDK Client either way. The round-trip gap (258 vs 459 ms) doesn’t close — that’s the raw cost of the Worker hop on every new sandbox.

Scaling ceiling (single host)

On our current hardware (AMD EPYC 24-core, 64 GB DDR5):
KnobLimit
Concurrent running sandboxes~50 comfortably at 1 GB RAM each
Warm pool size50 fine, 100 eats headroom
Burst create rate~83 creates/sec (pool-refill bottleneck, parallel)
fork(n) throughputlimited by disk write of the diff (~KB-MB)

Scaling ceiling (multi-region)

us-west + eu are live today. The Cloudflare Worker that fronts api.podflare.ai steers traffic by:
  1. Explicit X-Podflare-Region header (customer pin)
  2. Explicit /r/REGION/... path prefix (e.g. /r/us-west/v1/sandboxes)
  3. CF-IPContinent geo match + KV-cached capacity
  4. Automatic failover on 5xx (for create-class requests only)
Add a region → the Worker picks it up the moment DNS resolves; no SDK upgrade needed. Capacity snapshots refresh every 60 s from the hostd’s /v1/capacity endpoint.

The journey

Every committed number — from first microVM boot to the current production build.
Milestonecreate()fork(n=5)
Cold boot (no pool, no snapshot)1,414 ms
Snapshot-restore research (PoC)933 ms
Warm pool, cold-boot refill6 ms
Snapshot-backed refill, no CoW6 ms
+ xfs reflink CoW (refill 12 ms)6 ms549 ms
+ diff snapshot for fork6 ms101 ms
+ multi-region edge routing
+ persistent Spaces (freeze/resume)
Roadmap:
  • Lazy page loading for seed memory → pool refill approaches sub-5 ms.
  • RAM-backed fork snapshots → fork diff writes hit RAM, not disk.
  • Seed-tree templates — per-workload pre-imported snapshots. The python-datasci template already has pandas + numpy + scipy imported in the REPL before you get the sandbox.

Compared to what?

V8 isolates (Cloudflare Workers) spawn in ~1 ms with ~1 MB memory. They’re an entirely different product: JS/Wasm only, no native deps, no filesystem, no Python REPL state. Real agents that run pip install scikit-learn && model.fit(X) don’t run in isolates. Container-only platforms without snapshot-restore quote “cold start” times that are really docker run + runtime init. A Python container takes ~500 ms–2 s to become usable. Podflare’s warm pool hit is 10–50× faster and the sandbox is already-ready for run_code. Other microVM-based sandbox providers exist. Where we’re different:
  • ~80 ms fork() — Podflare’s fork primitive snapshots a running sandbox and spawns N children in parallel. Most providers don’t expose fork at all.
  • Persistent Spaces — freeze-to-disk on idle, resume into a fresh sandbox later. The VM’s running Python process survives.
  • Your own edge routing — built-in geo routing + automatic failover. No load-balancer surcharge, no vendor lock.

Reproduce the numbers

Every number on this page was produced by the bench script in our repo:
git clone https://github.com/PodFlare-ai/podflare
cd podflare
PODFLARE_API_KEY=pk_live_... python3 scripts/bench.py --n 30
Re-run any time you want to audit the marketing. If our numbers regress, we’d rather you find out than pretend.

Methodology

  • Client: macOS laptop on regular residential wifi (west-coast US). This is not the best case. Last-mile wifi + consumer router buffering add roughly 100–150 ms of round-trip time that isn’t there on fiber-uplink desktops or cloud-to-cloud traffic.
  • Network path (direct): Laptop → TLS → Caddy on usw1.podflare.ai → hostd (Latitude SJC bare metal).
  • Network path (edge): Laptop → Cloudflare edge → api.podflare.ai Worker → origin → hostd.
  • Sandbox: Stock default template. 1 GB RAM, 4 GB rootfs, pool-warm.
  • SDK: podflare==0.0.10 (Python) with a single shared httpx Client — TLS handshake amortized across iterations, matching what real agent loops do (they don’t open a fresh connection per exec).
  • N: 30 end-to-end iterations + 30 hot-exec iterations after one discarded warmup.
  • Date: April 2026.
No selective sampling. No “we excluded network jitter.” The script in scripts/bench.py runs every iteration back-to-back and reports statistics.mean + percentiles over the raw samples. If your agent runs on a VPS, a fiber-uplink desktop, or a nearby cloud region, coast-to-coast RTT drops from ~50 ms on wifi to ~20 ms on fiber and ~5 ms inside the same metro. Apply that delta to each HTTPS round-trip in the numbers above:
  • Hot run_code(): 38 ms on wifi → ~15 ms on fiber → sub-10 ms in-cloud
  • Full round-trip: 226 ms on wifi → ~100 ms on fiber → ~30 ms in-cloud
The server-side compute numbers (fork 24 ms, pool hit 7–11 ms, pool refill 12 ms) don’t change — those are what hostd does regardless of where the client sits. Network is the variable.