Performance - Podflare

Podflare runs each sandbox in a dedicated microVM with concurrent snapshot restore, a warm pool, and copy-on-write rootfs clones. The whole point of that stack is that the latency a customer actually feels is the same order of magnitude as calling a local function — not starting a container.

TL;DR — the headline numbers

Production state, end of Q1 2026, measured over 30 sequential cold-start iterations (matches scripts/bench-reliability.py). Two columns because where your client runs dominates the wall-clock number — your in-cloud agent never pays the residential last-mile tax:

Operation	from a US laptop on residential wifi	from in-cloud (your agent’s view)
`fork()` server-side (warm diff snapshot)	24 ms	24 ms
`create()` pool hit (server-side)	6 ms	6 ms
Pool refill per VM (background)	12 ms	12 ms
Hot `run_code()` p50	46 ms	42 ms
Full `Sandbox() → run_code() → close()` p50	187 ms	43 ms
Full `Sandbox() → run_code() → close()` p95	208 ms	78 ms
Full `Sandbox() → run_code() → close()` p99	221 ms	188 ms
Worst observed in 100 iter (max)	475 ms	222 ms

Server-side compute is single-digit ms; everything else is the network between you and the nearest region. From an agent running in any major cloud region the full create+exec+close round-trip lands at p50 = 43 ms, p95 = 78 ms — measured from a Podflare host benching against itself, which has the same network shape as any nearby cloud customer. For head-to-head comparisons against E2B, Daytona, and Blaxel, see vs E2B, Daytona, and Blaxel — TL;DR: Podflare wins every percentile (p50 / p95 / p99 / max).

~100 ms full round-trip from a fiber uplink or in-cloud. The 187 ms p50 above was measured from a laptop on residential wifi, which adds ~100–150 ms of last-mile jitter on top of the actual server-side compute (single-digit ms). From a fiber-uplink desktop, a VPS, or another cloud region in the same metro, you can expect the full create + exec + destroy flow to land around 100 ms p50 with similarly tight p95/p99. Reproduce with scripts/bench-reliability.py from wherever your agent actually runs.

Headline: edge-routed hot path

What a long-running agent actually looks like — one Sandbox, many execs. This is the number that matters most.

sb = Sandbox()                                  # defaults to api.podflare.ai
sb.run_code("print('warm')")                    # warmup
for step in agent_loop:
    sb.run_code(step.code)                      # ← this is the hot path

Hot `run_code()` — already-live sandbox

Metric	via `api.podflare.ai` (default)
p50	46 ms
min	39 ms
mean	50 ms

A full HTTPS POST /v1/sandboxes/:id/exec with NDJSON streaming back. Python interpreter, vsock marshalling, the works. 46 ms p50 is what an LLM agent loop sees per tool call — measurably faster than E2B (180 ms) and Daytona (182 ms) because we use vsock instead of an in-VM HTTP server.

Full round-trip — `Sandbox() → run_code() → close()`

100 sequential iterations via api.podflare.ai (Cloudflare Worker → nearest origin, haversine-routed), SDK ≥ 0.0.20. Two vantage points — your real number is whichever matches where your agent runs.

Percentile	from US laptop / residential wifi	from in-cloud (intra-DC)
min	164 ms	42 ms
p50	187 ms	43 ms
p90	198 ms	— (still ≤ 78 ms)
p95	208 ms	78 ms
p99	221 ms	188 ms
max	475 ms	222 ms
spread (p95 − p50)	21 ms	35 ms

Both columns cover create + exec + destroy + 3 HTTPS round-trips. The residential column adds ~100–150 ms of last-mile wifi/ISP jitter on top of the in-cloud number. The in-cloud column was measured benching us-west’s hostd against itself — same network shape as any customer agent running in a nearby cloud region. SDK 0.0.20 defaults to api.podflare.ai, the Cloudflare Worker fronting every region. CF’s edge PoP is usually closer to the caller than any individual region origin — from a US west-coast laptop, the edge is ~5–15 ms away versus ~30–50 ms direct to us-west — and the Cloudflare backbone to origin is cleaner than the public-internet route. Counter-intuitive but reproducible: the “extra hop” is shorter wall-clock than going direct.

Where the p95/p99 fix came from (SDK 0.0.19)

Pre-0.0.19 the same bench showed p99 = 1,741 ms and max = 2,290 ms. That “looked” like public-internet TCP SYN drops (0.05–0.35 % across our hosting fleet get dropped, which Linux turns into 1 s → 3 s → 7 s retransmits). It wasn’t. SDK 0.0.17 tried to fix it by capping the connect timeout at 800 ms with retries=2 on a fresh socket. That backfired on residential wifi: a slow-but-real TLS handshake (900 ms due to noisy local DNS) gets killed at 800 ms, retried (killed again), retried (killed again) — three ConnectTimeouts chained together at ~2.9 s. Self-inflicted. SDK 0.0.19 fixes it properly:

timeouts = httpx.Timeout(connect=2.5, read=30.0, write=10.0, pool=5.0)
client = httpx.Client(..., transport=httpx.HTTPTransport(retries=1))

Connect widened to 2.5 s so normal-slow handshakes complete on the first attempt; retries dropped to 1 so a real failure isn’t compounded. Result: p99 went 1,741 → 221 ms, max 2,290 → 475 ms. Reproduce against any blackhole IP:

8s connect + retries=2, against blackhole:  raised after 2,910 ms
8s connect + retries=0:                     raised after   803 ms
0s connect + retries=0:                     raised after 2,002 ms

`fork()` — the 100 ms primitive

Copy-on-write mid-flight snapshot + reflink rootfs. One parent, N children, each starting from the parent’s exact state.

Path	Time
Server-side, diff snapshot + spawn (warm)	24 ms
Server-side, first fork from fresh parent (cold diff)	116 ms
End-to-end through SDK from laptop	~80 ms p50

Where the 24 ms goes on the server side:

Phase	Time
Pause the running VM	1 ms
Capture dirty-page diff snapshot	4–5 ms
Resume the parent	1 ms
Prepare child memory (shared CoW)	~1 ms
Merge diff onto shared base	2–3 ms
Spawn N children in parallel	12–17 ms

The first fork from a fresh parent pays ~90 ms extra because the parent has been running long enough to dirty ~20 MB of pages — those have to hit disk once. On subsequent forks there’s nothing new to dirty and the snapshot stays in the single-digit ms range.

Under the hood — internal latencies

These are the hostd-local numbers (loopback, no network). They’re what the customer-facing numbers above build up from.

Operation	Latency
`create()` pool hit	6 ms
Pool refill per VM	12 ms
`run_code("print(1)")` REPL hot (hostd-local)	3–5 ms
`upload(256 KB)` via vsock	~5 ms
`merge_into(winner)`	~50 ms

Pool refill: 12 ms

Phase	Time
Metadata-only rootfs clone (CoW)	0–1 ms
Spawn the VM process	~3 ms
VM ready	under 2 ms
Load snapshot	~3 ms
Agent ready	~3 ms

Rootfs clones are metadata-only — no block copy — because of the filesystem we use on the host. On a default filesystem the same step takes 700+ ms; that’s the difference between a warm pool that keeps up and one that doesn’t.

Snapshot-load: 3 ms

Seed memory is shared across every pool VM with page-level copy-on-write, so kernel page cache is hit most of the time. Only pages the child actually dirties get copied out.

Which URL should I hit?

Short answer: just call Sandbox() with no arguments. As of SDK 0.0.20 that goes to api.podflare.ai — the Cloudflare Worker that haversine-routes to the nearest origin. From most caller geographies it’s faster than hand-pinning a region URL, because the CF edge PoP is closer than any individual origin and CF’s backbone to the origin is cleaner than the public-internet route. The full picture, in order of preference:

How you call it	p50	p99	Notes
`Sandbox()` (SDK ≥ 0.0.20)	187 ms	221 ms	Default. Goes through `api.podflare.ai` → nearest origin. Automatic 5xx failover included.
`Sandbox(region="us-east")`	~187 ms	~221 ms	Same path — hint gets honored by the Worker and the nearest-origin pick gets biased.
`Sandbox(host="https://usw1.podflare.ai")`	190–260 ms	483 ms	Direct-to-origin. Skips the Worker. Slower from most residential callers; useful for in-cloud benches where your agent is co-located with a specific region.

Measured from a California laptop, 100 iterations each, SDK 0.0.20:

  via api.podflare.ai  →  p50 = 187 ms, p99 = 221 ms, max = 475 ms
  direct to usw1       →  p50 = 203 ms, p99 = 483 ms, max = 594 ms

The Worker also adds automatic failover on origin 5xx (retries the next-nearest region) and enforces concurrent-sandbox limits server-side with a 10 s KV-cached fan-out. You give up none of that by using the default.

When to override the default

Your agent runs in the same DC as a specific region (e.g. a Latitude-hosted worker talking to usw1). Direct skips the ~15 ms edge overhead and the CF backbone hop — both of which are pure latency when you’re already one rack away from the origin.
Hard region pinning for compliance — Sandbox(region="eu") forces traffic to Helsinki regardless of where you’re calling from.

# Direct-to-origin: only faster when you're in the same DC.
sb = Sandbox(host="https://usw1.podflare.ai")

# Region-pin via the Worker: still edge-routed, just biased.
sb = Sandbox(region="eu")

Scaling ceiling (single host)

On our current hardware (AMD EPYC 24-core, 96 GB DDR5):

Knob	Limit
Warm pool size	120 routinely (each idle warm VM is ~40 MB RSS)
Concurrent running sandboxes	~250 at 1 GB allocation, ~300 MB avg RSS each
Burst create rate	~83 creates/sec (pool-refill bottleneck, parallel)
`fork(n)` throughput	limited by disk write of the diff (~KB-MB)

Idle warm VMs use far less RAM than their nominal allocation — Our Pod runtime’s COW + lazy snapshot restore means the guest only commits pages it actually touches. We measured ~40 MB RSS per idle 1 GB warm VM under steady-state, so a 120-warm pool burns ~5 GB of the box’s 93 GB available. The other 88 GB is headroom for on-demand customer creates and Space resumes.

Scaling ceiling (multi-region)

Five regions live today, 544 warm sandboxes ready for instant hand-off across the fleet:

region	location	warm pool
`us-west`	San Jose (Latitude)	120
`us-central`	Dallas (Latitude)	120
`us-east`	Ashburn (Latitude)	120
`eu`	Helsinki (Hetzner)	64
`sg`	Singapore (Latitude)	120

Routing happens at the Cloudflare Worker fronting api.podflare.ai: server-side haversine using Cloudflare’s per-request cf.latitude / cf.longitude picks the closest origin. SDK 0.0.20 defaults here — pre-0.0.20 clients with client-side timezone routing still work, but the default has moved to edge-routed because CF’s edge PoP is usually closer to the caller than any individual origin, and the CF backbone to origin is cleaner than the public-internet route. The default can be overridden:

Explicit region="us-east" in the SDK or X-Podflare-Region header in raw HTTP
Explicit /r/REGION/... path prefix (e.g. /r/us-east/v1/sandboxes)
Direct region URL via Sandbox(host="https://use1.podflare.ai")
Automatic Worker failover on 5xx (for create-class requests only)

Add a region → update api-router/src/regions.ts with its coordinates and the Worker picks it up the next deploy. SDK clients learn it via the response endpoint header even before an SDK publish.

Distance routing examples

caller	nearest region	reason
San Francisco	us-west	58 km
Toronto	us-east	570 km
Mexico City	us-central	1450 km (was 3100 from us-east)
Bangalore	sg	3300 km
Sydney	sg	6300 km (vs 12 000 km us-west)
Helsinki	eu	30 km

Today’s numbers

Production state, end of Q1 2026. All five regions (us-west · us-central · us-east · eu · sg) report the same shape:

operation	latency
server-side `create()` (pool hit)	6 ms
server-side `fork(n=5)` (warm diff)	101 ms (24 ms snapshot + 77 ms parallel spawn)
end-to-end `Sandbox() → run_code() → close()` (p50)	187 ms
end-to-end (p95)	208 ms
end-to-end (p99)	221 ms
end-to-end (max, 100 iter)	475 ms
hot `run_code()` on an already-live sandbox (p50)	46 ms
pool refill per VM (background)	12 ms

For context vs other AI-agent sandbox platforms, see vs E2B and Daytona — same harness, same machine, same minute. Podflare cold-start is 2.2× faster than E2B and 2.9× faster than Daytona in head-to-head benches.

Roadmap

UFFD lazy memory paging — replace eager MAP_PRIVATE mmap on snapshot resume with userfaultfd-served on-demand pages. Estimated 50–100 ms off pool-refill time, gated on real production data (waiting for on_demand_boot_ms_mean from the new pool-stats observability to justify it).
Memory prefetch mapping — record the pages a guest touches during cold-start at template-build time, replay on resume in a background thread. Phase 2 after UFFD lands.
RAM-backed fork snapshots — fork diff writes hit RAM, not disk.
Seed-tree templates — per-workload pre-imported snapshots. The python-datasci template already has pandas + numpy + scipy imported in the REPL before you get the sandbox.
us-east region promotion — flip continents: ["NA", "SA"] on us-east in the Worker (currently distance-routed only). Cuts geography variance for east-coast NA callers.
In-sandbox envd-compatible API — Drop-in surface for the e2b_code_interpreter SDK so customers can repoint at api.podflare.ai with zero code change.

Compared to what?

V8 isolates (Cloudflare Workers) spawn in ~1 ms with ~1 MB memory. They’re an entirely different product: JS/Wasm only, no native deps, no filesystem, no Python REPL state. Real agents that run pip install scikit-learn && model.fit(X) don’t run in isolates. Container-only platforms without snapshot-restore quote “cold start” times that are really docker run + runtime init. A Python container takes ~500 ms–2 s to become usable. Podflare’s warm pool hit is 10–50× faster and the sandbox is already-ready for run_code. E2B and Daytona are the two other major AI-agent sandbox platforms. We benchmarked all three head-to-head in the same minute on the same machine — see vs E2B and Daytona for the full numbers. TL;DR: Podflare cold-start is 2.2× faster than E2B and 2.9× faster than Daytona, mostly because of our vsock-based exec path (46 ms first_exec vs their 180 ms). What we have that they don’t:

~80 ms fork() — copy-on-write diff snapshot + N parallel microVM spawns. Neither E2B nor Daytona expose fork.
Persistent Spaces — freeze-to-disk on idle, resume into a fresh sandbox later. The VM’s running Python process survives. Both competitors only support container-commit-style snapshots.
5-region edge routing with haversine geo + automatic failover. E2B and Daytona are single-region per cluster.

Reproduce the numbers

Every number on this page was produced by the bench script in our repo:

git clone https://github.com/PodFlare-ai/podflare
cd podflare
PODFLARE_API_KEY=pf_live_... python3 scripts/bench.py --n 30

Re-run any time you want to audit the marketing. If our numbers regress, we’d rather you find out than pretend.

Methodology

Client: macOS laptop on regular residential wifi (west-coast US). This is not the best case. Last-mile wifi + consumer router buffering add roughly 100–150 ms of round-trip time that isn’t there on fiber-uplink desktops or cloud-to-cloud traffic.
Network path (direct): Laptop → TLS → Caddy on usw1.podflare.ai → hostd (Latitude SJC bare metal).
Network path (edge): Laptop → Cloudflare edge → api.podflare.ai Worker → origin → hostd.
Sandbox: Stock default template. 1 GB RAM, 4 GB rootfs, pool-warm.
SDK: podflare==0.0.20 (Python) — defaults to api.podflare.ai, which is Cloudflare-edge-routed to the nearest origin. Single shared httpx Client with connect=2.5 s + retries=1. Connection reuse matches what real agent loops do (they don’t open a fresh connection per exec).
N: 5 cold-start iterations end-to-end + 30 hot-exec iterations after one discarded warmup. Median reported. Cold-cold first call preserved (not discarded — it’s a real customer experience).
Date: April 2026.

No selective sampling. No “we excluded network jitter.” The script in scripts/bench.py runs every iteration back-to-back and reports statistics.mean + percentiles over the raw samples.

What you’ll see from a fiber uplink or in-cloud

If your agent runs on a VPS, a fiber-uplink desktop, or a nearby cloud region, coast-to-coast RTT drops from ~50 ms on wifi to ~20 ms on fiber and ~5 ms inside the same metro. Apply that delta to each HTTPS round-trip in the numbers above:

Hot run_code(): 38 ms on wifi → ~15 ms on fiber → sub-10 ms in-cloud
Full round-trip: 226 ms on wifi → ~100 ms on fiber → ~30 ms in-cloud

The server-side compute numbers (fork 24 ms, pool hit 6 ms, pool refill 12 ms) don’t change — those are what hostd does regardless of where the client sits. Network is the variable.

​TL;DR — the headline numbers

​Headline: edge-routed hot path

​Hot run_code() — already-live sandbox

​Full round-trip — Sandbox() → run_code() → close()

​Where the p95/p99 fix came from (SDK 0.0.19)

​fork() — the 100 ms primitive

​Under the hood — internal latencies

​Pool refill: 12 ms

​Snapshot-load: 3 ms

​Which URL should I hit?

​When to override the default

​Scaling ceiling (single host)

​Scaling ceiling (multi-region)

​Distance routing examples

​Today’s numbers

​Roadmap

​Compared to what?

​Reproduce the numbers

​Methodology

​What you’ll see from a fiber uplink or in-cloud