Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.podflare.ai/llms.txt

Use this file to discover all available pages before exploring further.

Podflare runs each sandbox in a dedicated microVM with concurrent snapshot restore, a warm pool, and copy-on-write rootfs clones. The whole point of that stack is that the latency a customer actually feels is the same order of magnitude as calling a local function — not starting a container.

TL;DR — the headline numbers

Production state, end of Q1 2026, measured over 30 sequential cold-start iterations (matches scripts/bench-reliability.py). Two columns because where your client runs dominates the wall-clock number — your in-cloud agent never pays the residential last-mile tax:
Operationfrom a US laptop on residential wififrom in-cloud (your agent’s view)
fork() server-side (warm diff snapshot)24 ms24 ms
create() pool hit (server-side)6 ms6 ms
Pool refill per VM (background)12 ms12 ms
Hot run_code() p5046 ms42 ms
Full Sandbox() → run_code() → close() p50187 ms43 ms
Full Sandbox() → run_code() → close() p95208 ms78 ms
Full Sandbox() → run_code() → close() p99221 ms188 ms
Worst observed in 100 iter (max)475 ms222 ms
Server-side compute is single-digit ms; everything else is the network between you and the nearest region. From an agent running in any major cloud region the full create+exec+close round-trip lands at p50 = 43 ms, p95 = 78 ms — measured from a Podflare host benching against itself, which has the same network shape as any nearby cloud customer. For head-to-head comparisons against E2B, Daytona, and Blaxel, see vs E2B, Daytona, and Blaxel — TL;DR: Podflare wins every percentile (p50 / p95 / p99 / max).
~100 ms full round-trip from a fiber uplink or in-cloud. The 187 ms p50 above was measured from a laptop on residential wifi, which adds ~100–150 ms of last-mile jitter on top of the actual server-side compute (single-digit ms). From a fiber-uplink desktop, a VPS, or another cloud region in the same metro, you can expect the full create + exec + destroy flow to land around 100 ms p50 with similarly tight p95/p99. Reproduce with scripts/bench-reliability.py from wherever your agent actually runs.

Headline: edge-routed hot path

What a long-running agent actually looks like — one Sandbox, many execs. This is the number that matters most.
sb = Sandbox()                                  # defaults to api.podflare.ai
sb.run_code("print('warm')")                    # warmup
for step in agent_loop:
    sb.run_code(step.code)                      # ← this is the hot path

Hot run_code() — already-live sandbox

Metricvia api.podflare.ai (default)
p5046 ms
min39 ms
mean50 ms
A full HTTPS POST /v1/sandboxes/:id/exec with NDJSON streaming back. Python interpreter, vsock marshalling, the works. 46 ms p50 is what an LLM agent loop sees per tool call — measurably faster than E2B (180 ms) and Daytona (182 ms) because we use vsock instead of an in-VM HTTP server.

Full round-trip — Sandbox() → run_code() → close()

100 sequential iterations via api.podflare.ai (Cloudflare Worker → nearest origin, haversine-routed), SDK ≥ 0.0.20. Two vantage points — your real number is whichever matches where your agent runs.
Percentilefrom US laptop / residential wififrom in-cloud (intra-DC)
min164 ms42 ms
p50187 ms43 ms
p90198 ms— (still ≤ 78 ms)
p95208 ms78 ms
p99221 ms188 ms
max475 ms222 ms
spread (p95 − p50)21 ms35 ms
Both columns cover create + exec + destroy + 3 HTTPS round-trips. The residential column adds ~100–150 ms of last-mile wifi/ISP jitter on top of the in-cloud number. The in-cloud column was measured benching us-west’s hostd against itself — same network shape as any customer agent running in a nearby cloud region. SDK 0.0.20 defaults to api.podflare.ai, the Cloudflare Worker fronting every region. CF’s edge PoP is usually closer to the caller than any individual region origin — from a US west-coast laptop, the edge is ~5–15 ms away versus ~30–50 ms direct to us-west — and the Cloudflare backbone to origin is cleaner than the public-internet route. Counter-intuitive but reproducible: the “extra hop” is shorter wall-clock than going direct.

Where the p95/p99 fix came from (SDK 0.0.19)

Pre-0.0.19 the same bench showed p99 = 1,741 ms and max = 2,290 ms. That “looked” like public-internet TCP SYN drops (0.05–0.35 % across our hosting fleet get dropped, which Linux turns into 1 s → 3 s → 7 s retransmits). It wasn’t. SDK 0.0.17 tried to fix it by capping the connect timeout at 800 ms with retries=2 on a fresh socket. That backfired on residential wifi: a slow-but-real TLS handshake (900 ms due to noisy local DNS) gets killed at 800 ms, retried (killed again), retried (killed again) — three ConnectTimeouts chained together at ~2.9 s. Self-inflicted. SDK 0.0.19 fixes it properly:
timeouts = httpx.Timeout(connect=2.5, read=30.0, write=10.0, pool=5.0)
client = httpx.Client(..., transport=httpx.HTTPTransport(retries=1))
Connect widened to 2.5 s so normal-slow handshakes complete on the first attempt; retries dropped to 1 so a real failure isn’t compounded. Result: p99 went 1,741 → 221 ms, max 2,290 → 475 ms. Reproduce against any blackhole IP:
0.8s connect + retries=2, against blackhole:  raised after 2,910 ms
0.8s connect + retries=0:                     raised after   803 ms
2.0s connect + retries=0:                     raised after 2,002 ms

fork() — the 100 ms primitive

Copy-on-write mid-flight snapshot + reflink rootfs. One parent, N children, each starting from the parent’s exact state.
PathTime
Server-side, diff snapshot + spawn (warm)24 ms
Server-side, first fork from fresh parent (cold diff)116 ms
End-to-end through SDK from laptop~80 ms p50
Where the 24 ms goes on the server side:
PhaseTime
Pause the running VM1 ms
Capture dirty-page diff snapshot4–5 ms
Resume the parent1 ms
Prepare child memory (shared CoW)~1 ms
Merge diff onto shared base2–3 ms
Spawn N children in parallel12–17 ms
The first fork from a fresh parent pays ~90 ms extra because the parent has been running long enough to dirty ~20 MB of pages — those have to hit disk once. On subsequent forks there’s nothing new to dirty and the snapshot stays in the single-digit ms range.

Under the hood — internal latencies

These are the hostd-local numbers (loopback, no network). They’re what the customer-facing numbers above build up from.
OperationLatency
create() pool hit6 ms
Pool refill per VM12 ms
run_code("print(1)") REPL hot (hostd-local)3–5 ms
upload(256 KB) via vsock~5 ms
merge_into(winner)~50 ms

Pool refill: 12 ms

PhaseTime
Metadata-only rootfs clone (CoW)0–1 ms
Spawn the VM process~3 ms
VM readyunder 2 ms
Load snapshot~3 ms
Agent ready~3 ms
Rootfs clones are metadata-only — no block copy — because of the filesystem we use on the host. On a default filesystem the same step takes 700+ ms; that’s the difference between a warm pool that keeps up and one that doesn’t.

Snapshot-load: 3 ms

Seed memory is shared across every pool VM with page-level copy-on-write, so kernel page cache is hit most of the time. Only pages the child actually dirties get copied out.

Which URL should I hit?

Short answer: just call Sandbox() with no arguments. As of SDK 0.0.20 that goes to api.podflare.ai — the Cloudflare Worker that haversine-routes to the nearest origin. From most caller geographies it’s faster than hand-pinning a region URL, because the CF edge PoP is closer than any individual origin and CF’s backbone to the origin is cleaner than the public-internet route. The full picture, in order of preference:
How you call itp50p99Notes
Sandbox() (SDK ≥ 0.0.20)187 ms221 msDefault. Goes through api.podflare.ai → nearest origin. Automatic 5xx failover included.
Sandbox(region="us-east")~187 ms~221 msSame path — hint gets honored by the Worker and the nearest-origin pick gets biased.
Sandbox(host="https://usw1.podflare.ai")190–260 ms483 msDirect-to-origin. Skips the Worker. Slower from most residential callers; useful for in-cloud benches where your agent is co-located with a specific region.
Measured from a California laptop, 100 iterations each, SDK 0.0.20:
  via api.podflare.ai  →  p50 = 187 ms, p99 = 221 ms, max = 475 ms
  direct to usw1       →  p50 = 203 ms, p99 = 483 ms, max = 594 ms
The Worker also adds automatic failover on origin 5xx (retries the next-nearest region) and enforces concurrent-sandbox limits server-side with a 10 s KV-cached fan-out. You give up none of that by using the default.

When to override the default

  • Your agent runs in the same DC as a specific region (e.g. a Latitude-hosted worker talking to usw1). Direct skips the ~15 ms edge overhead and the CF backbone hop — both of which are pure latency when you’re already one rack away from the origin.
  • Hard region pinning for complianceSandbox(region="eu") forces traffic to Helsinki regardless of where you’re calling from.
# Direct-to-origin: only faster when you're in the same DC.
sb = Sandbox(host="https://usw1.podflare.ai")

# Region-pin via the Worker: still edge-routed, just biased.
sb = Sandbox(region="eu")

Scaling ceiling (single host)

On our current hardware (AMD EPYC 24-core, 96 GB DDR5):
KnobLimit
Warm pool size120 routinely (each idle warm VM is ~40 MB RSS)
Concurrent running sandboxes~250 at 1 GB allocation, ~300 MB avg RSS each
Burst create rate~83 creates/sec (pool-refill bottleneck, parallel)
fork(n) throughputlimited by disk write of the diff (~KB-MB)
Idle warm VMs use far less RAM than their nominal allocation — Our Pod runtime’s COW + lazy snapshot restore means the guest only commits pages it actually touches. We measured ~40 MB RSS per idle 1 GB warm VM under steady-state, so a 120-warm pool burns ~5 GB of the box’s 93 GB available. The other 88 GB is headroom for on-demand customer creates and Space resumes.

Scaling ceiling (multi-region)

Five regions live today, 544 warm sandboxes ready for instant hand-off across the fleet:
regionlocationwarm pool
us-westSan Jose (Latitude)120
us-centralDallas (Latitude)120
us-eastAshburn (Latitude)120
euHelsinki (Hetzner)64
sgSingapore (Latitude)120
Routing happens at the Cloudflare Worker fronting api.podflare.ai: server-side haversine using Cloudflare’s per-request cf.latitude / cf.longitude picks the closest origin. SDK 0.0.20 defaults here — pre-0.0.20 clients with client-side timezone routing still work, but the default has moved to edge-routed because CF’s edge PoP is usually closer to the caller than any individual origin, and the CF backbone to origin is cleaner than the public-internet route. The default can be overridden:
  1. Explicit region="us-east" in the SDK or X-Podflare-Region header in raw HTTP
  2. Explicit /r/REGION/... path prefix (e.g. /r/us-east/v1/sandboxes)
  3. Direct region URL via Sandbox(host="https://use1.podflare.ai")
  4. Automatic Worker failover on 5xx (for create-class requests only)
Add a region → update api-router/src/regions.ts with its coordinates and the Worker picks it up the next deploy. SDK clients learn it via the response endpoint header even before an SDK publish.

Distance routing examples

callernearest regionreason
San Franciscous-west58 km
Torontous-east570 km
Mexico Cityus-central1450 km (was 3100 from us-east)
Bangaloresg3300 km
Sydneysg6300 km (vs 12 000 km us-west)
Helsinkieu30 km

Today’s numbers

Production state, end of Q1 2026. All five regions (us-west · us-central · us-east · eu · sg) report the same shape:
operationlatency
server-side create() (pool hit)6 ms
server-side fork(n=5) (warm diff)101 ms (24 ms snapshot + 77 ms parallel spawn)
end-to-end Sandbox() → run_code() → close() (p50)187 ms
end-to-end (p95)208 ms
end-to-end (p99)221 ms
end-to-end (max, 100 iter)475 ms
hot run_code() on an already-live sandbox (p50)46 ms
pool refill per VM (background)12 ms
For context vs other AI-agent sandbox platforms, see vs E2B and Daytona — same harness, same machine, same minute. Podflare cold-start is 2.2× faster than E2B and 2.9× faster than Daytona in head-to-head benches.

Roadmap

  • UFFD lazy memory paging — replace eager MAP_PRIVATE mmap on snapshot resume with userfaultfd-served on-demand pages. Estimated 50–100 ms off pool-refill time, gated on real production data (waiting for on_demand_boot_ms_mean from the new pool-stats observability to justify it).
  • Memory prefetch mapping — record the pages a guest touches during cold-start at template-build time, replay on resume in a background thread. Phase 2 after UFFD lands.
  • RAM-backed fork snapshots — fork diff writes hit RAM, not disk.
  • Seed-tree templates — per-workload pre-imported snapshots. The python-datasci template already has pandas + numpy + scipy imported in the REPL before you get the sandbox.
  • us-east region promotion — flip continents: ["NA", "SA"] on us-east in the Worker (currently distance-routed only). Cuts geography variance for east-coast NA callers.
  • In-sandbox envd-compatible API — Drop-in surface for the e2b_code_interpreter SDK so customers can repoint at api.podflare.ai with zero code change.

Compared to what?

V8 isolates (Cloudflare Workers) spawn in ~1 ms with ~1 MB memory. They’re an entirely different product: JS/Wasm only, no native deps, no filesystem, no Python REPL state. Real agents that run pip install scikit-learn && model.fit(X) don’t run in isolates. Container-only platforms without snapshot-restore quote “cold start” times that are really docker run + runtime init. A Python container takes ~500 ms–2 s to become usable. Podflare’s warm pool hit is 10–50× faster and the sandbox is already-ready for run_code. E2B and Daytona are the two other major AI-agent sandbox platforms. We benchmarked all three head-to-head in the same minute on the same machine — see vs E2B and Daytona for the full numbers. TL;DR: Podflare cold-start is 2.2× faster than E2B and 2.9× faster than Daytona, mostly because of our vsock-based exec path (46 ms first_exec vs their 180 ms). What we have that they don’t:
  • ~80 ms fork() — copy-on-write diff snapshot + N parallel microVM spawns. Neither E2B nor Daytona expose fork.
  • Persistent Spaces — freeze-to-disk on idle, resume into a fresh sandbox later. The VM’s running Python process survives. Both competitors only support container-commit-style snapshots.
  • 5-region edge routing with haversine geo + automatic failover. E2B and Daytona are single-region per cluster.

Reproduce the numbers

Every number on this page was produced by the bench script in our repo:
git clone https://github.com/PodFlare-ai/podflare
cd podflare
PODFLARE_API_KEY=pf_live_... python3 scripts/bench.py --n 30
Re-run any time you want to audit the marketing. If our numbers regress, we’d rather you find out than pretend.

Methodology

  • Client: macOS laptop on regular residential wifi (west-coast US). This is not the best case. Last-mile wifi + consumer router buffering add roughly 100–150 ms of round-trip time that isn’t there on fiber-uplink desktops or cloud-to-cloud traffic.
  • Network path (direct): Laptop → TLS → Caddy on usw1.podflare.ai → hostd (Latitude SJC bare metal).
  • Network path (edge): Laptop → Cloudflare edge → api.podflare.ai Worker → origin → hostd.
  • Sandbox: Stock default template. 1 GB RAM, 4 GB rootfs, pool-warm.
  • SDK: podflare==0.0.20 (Python) — defaults to api.podflare.ai, which is Cloudflare-edge-routed to the nearest origin. Single shared httpx Client with connect=2.5 s + retries=1. Connection reuse matches what real agent loops do (they don’t open a fresh connection per exec).
  • N: 5 cold-start iterations end-to-end + 30 hot-exec iterations after one discarded warmup. Median reported. Cold-cold first call preserved (not discarded — it’s a real customer experience).
  • Date: April 2026.
No selective sampling. No “we excluded network jitter.” The script in scripts/bench.py runs every iteration back-to-back and reports statistics.mean + percentiles over the raw samples. If your agent runs on a VPS, a fiber-uplink desktop, or a nearby cloud region, coast-to-coast RTT drops from ~50 ms on wifi to ~20 ms on fiber and ~5 ms inside the same metro. Apply that delta to each HTTPS round-trip in the numbers above:
  • Hot run_code(): 38 ms on wifi → ~15 ms on fiber → sub-10 ms in-cloud
  • Full round-trip: 226 ms on wifi → ~100 ms on fiber → ~30 ms in-cloud
The server-side compute numbers (fork 24 ms, pool hit 6 ms, pool refill 12 ms) don’t change — those are what hostd does regardless of where the client sits. Network is the variable.