Documentation Index
Fetch the complete documentation index at: https://docs.podflare.ai/llms.txt
Use this file to discover all available pages before exploring further.
Podflare runs each sandbox in a dedicated microVM with concurrent
snapshot restore, a warm pool, and copy-on-write rootfs clones. The
whole point of that stack is that the latency a customer actually
feels is the same order of magnitude as calling a local function —
not starting a container.
TL;DR — the headline numbers
Production state, end of Q1 2026, measured over 30 sequential
cold-start iterations (matches scripts/bench-reliability.py).
Two columns because where your client runs dominates the wall-clock
number — your in-cloud agent never pays the residential last-mile tax:
| Operation | from a US laptop on residential wifi | from in-cloud (your agent’s view) |
|---|
fork() server-side (warm diff snapshot) | 24 ms | 24 ms |
create() pool hit (server-side) | 6 ms | 6 ms |
| Pool refill per VM (background) | 12 ms | 12 ms |
Hot run_code() p50 | 46 ms | 42 ms |
Full Sandbox() → run_code() → close() p50 | 187 ms | 43 ms |
Full Sandbox() → run_code() → close() p95 | 208 ms | 78 ms |
Full Sandbox() → run_code() → close() p99 | 221 ms | 188 ms |
| Worst observed in 100 iter (max) | 475 ms | 222 ms |
Server-side compute is single-digit ms; everything else is the
network between you and the nearest region. From an agent running
in any major cloud region the full create+exec+close round-trip
lands at p50 = 43 ms, p95 = 78 ms — measured from a Podflare host
benching against itself, which has the same network shape as any
nearby cloud customer.
For head-to-head comparisons against E2B, Daytona, and Blaxel, see
vs E2B, Daytona, and Blaxel — TL;DR:
Podflare wins every percentile (p50 / p95 / p99 / max).
~100 ms full round-trip from a fiber uplink or in-cloud. The
187 ms p50 above was measured from a laptop on residential wifi,
which adds ~100–150 ms of last-mile jitter on top of the actual
server-side compute (single-digit ms). From a fiber-uplink desktop,
a VPS, or another cloud region in the same metro, you can expect
the full create + exec + destroy flow to land around 100 ms p50
with similarly tight p95/p99. Reproduce with
scripts/bench-reliability.py from wherever your agent actually runs.
Headline: edge-routed hot path
What a long-running agent actually looks like — one Sandbox, many
execs. This is the number that matters most.
sb = Sandbox() # defaults to api.podflare.ai
sb.run_code("print('warm')") # warmup
for step in agent_loop:
sb.run_code(step.code) # ← this is the hot path
Hot run_code() — already-live sandbox
| Metric | via api.podflare.ai (default) |
|---|
| p50 | 46 ms |
| min | 39 ms |
| mean | 50 ms |
A full HTTPS POST /v1/sandboxes/:id/exec with NDJSON streaming back.
Python interpreter, vsock marshalling, the works. 46 ms p50 is what
an LLM agent loop sees per tool call — measurably faster than E2B
(180 ms) and Daytona (182 ms) because we use vsock instead of an
in-VM HTTP server.
Full round-trip — Sandbox() → run_code() → close()
100 sequential iterations via api.podflare.ai (Cloudflare Worker →
nearest origin, haversine-routed), SDK ≥ 0.0.20. Two vantage points —
your real number is whichever matches where your agent runs.
| Percentile | from US laptop / residential wifi | from in-cloud (intra-DC) |
|---|
| min | 164 ms | 42 ms |
| p50 | 187 ms | 43 ms |
| p90 | 198 ms | — (still ≤ 78 ms) |
| p95 | 208 ms | 78 ms |
| p99 | 221 ms | 188 ms |
| max | 475 ms | 222 ms |
| spread (p95 − p50) | 21 ms | 35 ms |
Both columns cover create + exec + destroy + 3 HTTPS round-trips.
The residential column adds ~100–150 ms of last-mile wifi/ISP jitter
on top of the in-cloud number. The in-cloud column was measured
benching us-west’s hostd against itself — same network shape as any
customer agent running in a nearby cloud region.
SDK 0.0.20 defaults to api.podflare.ai, the Cloudflare Worker
fronting every region. CF’s edge PoP is usually closer to the caller
than any individual region origin — from a US west-coast laptop, the
edge is ~5–15 ms away versus ~30–50 ms direct to us-west — and the
Cloudflare backbone to origin is cleaner than the public-internet
route. Counter-intuitive but reproducible: the “extra hop” is shorter
wall-clock than going direct.
Where the p95/p99 fix came from (SDK 0.0.19)
Pre-0.0.19 the same bench showed p99 = 1,741 ms and max = 2,290 ms.
That “looked” like public-internet TCP SYN drops (0.05–0.35 % across
our hosting fleet get dropped, which Linux turns into 1 s → 3 s → 7 s
retransmits). It wasn’t.
SDK 0.0.17 tried to fix it by capping the connect timeout at 800 ms
with retries=2 on a fresh socket. That backfired on residential
wifi: a slow-but-real TLS handshake (900 ms due to noisy local DNS)
gets killed at 800 ms, retried (killed again), retried (killed again)
— three ConnectTimeouts chained together at ~2.9 s. Self-inflicted.
SDK 0.0.19 fixes it properly:
timeouts = httpx.Timeout(connect=2.5, read=30.0, write=10.0, pool=5.0)
client = httpx.Client(..., transport=httpx.HTTPTransport(retries=1))
Connect widened to 2.5 s so normal-slow handshakes complete on the
first attempt; retries dropped to 1 so a real failure isn’t
compounded. Result: p99 went 1,741 → 221 ms, max 2,290 → 475 ms.
Reproduce against any blackhole IP:
0.8s connect + retries=2, against blackhole: raised after 2,910 ms
0.8s connect + retries=0: raised after 803 ms
2.0s connect + retries=0: raised after 2,002 ms
fork() — the 100 ms primitive
Copy-on-write mid-flight snapshot + reflink rootfs. One parent, N
children, each starting from the parent’s exact state.
| Path | Time |
|---|
| Server-side, diff snapshot + spawn (warm) | 24 ms |
| Server-side, first fork from fresh parent (cold diff) | 116 ms |
| End-to-end through SDK from laptop | ~80 ms p50 |
Where the 24 ms goes on the server side:
| Phase | Time |
|---|
| Pause the running VM | 1 ms |
| Capture dirty-page diff snapshot | 4–5 ms |
| Resume the parent | 1 ms |
| Prepare child memory (shared CoW) | ~1 ms |
| Merge diff onto shared base | 2–3 ms |
| Spawn N children in parallel | 12–17 ms |
The first fork from a fresh parent pays ~90 ms extra because the parent
has been running long enough to dirty ~20 MB of pages — those have to
hit disk once. On subsequent forks there’s nothing new to dirty and the
snapshot stays in the single-digit ms range.
Under the hood — internal latencies
These are the hostd-local numbers (loopback, no network). They’re
what the customer-facing numbers above build up from.
| Operation | Latency |
|---|
create() pool hit | 6 ms |
| Pool refill per VM | 12 ms |
run_code("print(1)") REPL hot (hostd-local) | 3–5 ms |
upload(256 KB) via vsock | ~5 ms |
merge_into(winner) | ~50 ms |
Pool refill: 12 ms
| Phase | Time |
|---|
| Metadata-only rootfs clone (CoW) | 0–1 ms |
| Spawn the VM process | ~3 ms |
| VM ready | under 2 ms |
| Load snapshot | ~3 ms |
| Agent ready | ~3 ms |
Rootfs clones are metadata-only — no block copy — because of the
filesystem we use on the host. On a default filesystem the same step
takes 700+ ms; that’s the difference between a warm pool that
keeps up and one that doesn’t.
Snapshot-load: 3 ms
Seed memory is shared across every pool VM with page-level copy-on-write,
so kernel page cache is hit most of the time. Only pages the child
actually dirties get copied out.
Which URL should I hit?
Short answer: just call Sandbox() with no arguments. As of SDK
0.0.20 that goes to api.podflare.ai — the Cloudflare Worker that
haversine-routes to the nearest origin. From most caller geographies
it’s faster than hand-pinning a region URL, because the CF edge PoP
is closer than any individual origin and CF’s backbone to the origin
is cleaner than the public-internet route.
The full picture, in order of preference:
| How you call it | p50 | p99 | Notes |
|---|
Sandbox() (SDK ≥ 0.0.20) | 187 ms | 221 ms | Default. Goes through api.podflare.ai → nearest origin. Automatic 5xx failover included. |
Sandbox(region="us-east") | ~187 ms | ~221 ms | Same path — hint gets honored by the Worker and the nearest-origin pick gets biased. |
Sandbox(host="https://usw1.podflare.ai") | 190–260 ms | 483 ms | Direct-to-origin. Skips the Worker. Slower from most residential callers; useful for in-cloud benches where your agent is co-located with a specific region. |
Measured from a California laptop, 100 iterations each, SDK 0.0.20:
via api.podflare.ai → p50 = 187 ms, p99 = 221 ms, max = 475 ms
direct to usw1 → p50 = 203 ms, p99 = 483 ms, max = 594 ms
The Worker also adds automatic failover on origin 5xx (retries the
next-nearest region) and enforces concurrent-sandbox limits server-side
with a 10 s KV-cached fan-out. You give up none of that by using the
default.
When to override the default
- Your agent runs in the same DC as a specific region (e.g. a
Latitude-hosted worker talking to
usw1). Direct skips the ~15 ms
edge overhead and the CF backbone hop — both of which are pure
latency when you’re already one rack away from the origin.
- Hard region pinning for compliance —
Sandbox(region="eu")
forces traffic to Helsinki regardless of where you’re calling from.
# Direct-to-origin: only faster when you're in the same DC.
sb = Sandbox(host="https://usw1.podflare.ai")
# Region-pin via the Worker: still edge-routed, just biased.
sb = Sandbox(region="eu")
Scaling ceiling (single host)
On our current hardware (AMD EPYC 24-core, 96 GB DDR5):
| Knob | Limit |
|---|
| Warm pool size | 120 routinely (each idle warm VM is ~40 MB RSS) |
| Concurrent running sandboxes | ~250 at 1 GB allocation, ~300 MB avg RSS each |
| Burst create rate | ~83 creates/sec (pool-refill bottleneck, parallel) |
fork(n) throughput | limited by disk write of the diff (~KB-MB) |
Idle warm VMs use far less RAM than their nominal allocation —
Our Pod runtime’s COW + lazy snapshot restore means the guest only
commits pages it actually touches. We measured ~40 MB RSS per idle
1 GB warm VM under steady-state, so a 120-warm pool burns ~5 GB of
the box’s 93 GB available. The other 88 GB is headroom for
on-demand customer creates and Space resumes.
Scaling ceiling (multi-region)
Five regions live today, 544 warm sandboxes ready for instant
hand-off across the fleet:
| region | location | warm pool |
|---|
us-west | San Jose (Latitude) | 120 |
us-central | Dallas (Latitude) | 120 |
us-east | Ashburn (Latitude) | 120 |
eu | Helsinki (Hetzner) | 64 |
sg | Singapore (Latitude) | 120 |
Routing happens at the Cloudflare Worker fronting api.podflare.ai:
server-side haversine using Cloudflare’s per-request cf.latitude /
cf.longitude picks the closest origin. SDK 0.0.20 defaults here —
pre-0.0.20 clients with client-side timezone routing still work, but
the default has moved to edge-routed because CF’s edge PoP is usually
closer to the caller than any individual origin, and the CF backbone
to origin is cleaner than the public-internet route.
The default can be overridden:
- Explicit
region="us-east" in the SDK or X-Podflare-Region
header in raw HTTP
- Explicit
/r/REGION/... path prefix (e.g. /r/us-east/v1/sandboxes)
- Direct region URL via
Sandbox(host="https://use1.podflare.ai")
- Automatic Worker failover on 5xx (for create-class requests only)
Add a region → update api-router/src/regions.ts with its coordinates
and the Worker picks it up the next deploy. SDK clients learn it via
the response endpoint header even before an SDK publish.
Distance routing examples
| caller | nearest region | reason |
|---|
| San Francisco | us-west | 58 km |
| Toronto | us-east | 570 km |
| Mexico City | us-central | 1450 km (was 3100 from us-east) |
| Bangalore | sg | 3300 km |
| Sydney | sg | 6300 km (vs 12 000 km us-west) |
| Helsinki | eu | 30 km |
Today’s numbers
Production state, end of Q1 2026. All five regions
(us-west · us-central · us-east · eu · sg) report the same shape:
| operation | latency |
|---|
server-side create() (pool hit) | 6 ms |
server-side fork(n=5) (warm diff) | 101 ms (24 ms snapshot + 77 ms parallel spawn) |
end-to-end Sandbox() → run_code() → close() (p50) | 187 ms |
| end-to-end (p95) | 208 ms |
| end-to-end (p99) | 221 ms |
| end-to-end (max, 100 iter) | 475 ms |
hot run_code() on an already-live sandbox (p50) | 46 ms |
| pool refill per VM (background) | 12 ms |
For context vs other AI-agent sandbox platforms, see
vs E2B and Daytona — same harness, same
machine, same minute. Podflare cold-start is 2.2× faster than E2B
and 2.9× faster than Daytona in head-to-head benches.
Roadmap
- UFFD lazy memory paging — replace eager
MAP_PRIVATE mmap on
snapshot resume with userfaultfd-served on-demand pages. Estimated
50–100 ms off pool-refill time, gated on real production data
(waiting for on_demand_boot_ms_mean from the new pool-stats
observability to justify it).
- Memory prefetch mapping — record the pages a guest touches during
cold-start at template-build time, replay on resume in a background
thread. Phase 2 after UFFD lands.
- RAM-backed fork snapshots — fork diff writes hit RAM, not disk.
- Seed-tree templates — per-workload pre-imported snapshots. The
python-datasci template already has pandas + numpy + scipy
imported in the REPL before you get the sandbox.
- us-east region promotion — flip
continents: ["NA", "SA"] on
us-east in the Worker (currently distance-routed only). Cuts
geography variance for east-coast NA callers.
- In-sandbox
envd-compatible API — Drop-in surface for the
e2b_code_interpreter SDK so customers can repoint at
api.podflare.ai with zero code change.
Compared to what?
V8 isolates (Cloudflare Workers) spawn in ~1 ms with ~1 MB memory.
They’re an entirely different product: JS/Wasm only, no native deps,
no filesystem, no Python REPL state. Real agents that run
pip install scikit-learn && model.fit(X) don’t run in isolates.
Container-only platforms without snapshot-restore quote “cold start”
times that are really docker run + runtime init. A Python container
takes ~500 ms–2 s to become usable. Podflare’s warm pool hit is
10–50× faster and the sandbox is already-ready for run_code.
E2B and Daytona are the two other major AI-agent sandbox platforms.
We benchmarked all three head-to-head in the same minute on the same
machine — see vs E2B and Daytona for the
full numbers. TL;DR: Podflare cold-start is 2.2× faster than E2B and
2.9× faster than Daytona, mostly because of our vsock-based exec
path (46 ms first_exec vs their 180 ms).
What we have that they don’t:
- ~80 ms
fork() — copy-on-write diff snapshot + N parallel
microVM spawns. Neither E2B nor Daytona expose fork.
- Persistent Spaces — freeze-to-disk on idle, resume into a fresh
sandbox later. The VM’s running Python process survives. Both
competitors only support container-commit-style snapshots.
- 5-region edge routing with haversine geo + automatic failover.
E2B and Daytona are single-region per cluster.
Reproduce the numbers
Every number on this page was produced by the bench script in our repo:
git clone https://github.com/PodFlare-ai/podflare
cd podflare
PODFLARE_API_KEY=pf_live_... python3 scripts/bench.py --n 30
Re-run any time you want to audit the marketing. If our numbers regress,
we’d rather you find out than pretend.
Methodology
- Client: macOS laptop on regular residential wifi (west-coast US).
This is not the best case. Last-mile wifi + consumer router
buffering add roughly 100–150 ms of round-trip time that isn’t
there on fiber-uplink desktops or cloud-to-cloud traffic.
- Network path (direct): Laptop → TLS → Caddy on
usw1.podflare.ai
→ hostd (Latitude SJC bare metal).
- Network path (edge): Laptop → Cloudflare edge →
api.podflare.ai
Worker → origin → hostd.
- Sandbox: Stock
default template. 1 GB RAM, 4 GB rootfs, pool-warm.
- SDK:
podflare==0.0.20 (Python) — defaults to api.podflare.ai,
which is Cloudflare-edge-routed to the nearest origin. Single shared
httpx Client with connect=2.5 s + retries=1. Connection reuse
matches what real agent loops do (they don’t open a fresh connection
per exec).
- N: 5 cold-start iterations end-to-end + 30 hot-exec iterations
after one discarded warmup. Median reported. Cold-cold first call
preserved (not discarded — it’s a real customer experience).
- Date: April 2026.
No selective sampling. No “we excluded network jitter.” The script in
scripts/bench.py runs every iteration back-to-back and reports
statistics.mean + percentiles over the raw samples.
What you’ll see from a fiber uplink or in-cloud
If your agent runs on a VPS, a fiber-uplink desktop, or a nearby cloud
region, coast-to-coast RTT drops from ~50 ms on wifi to ~20 ms on fiber
and ~5 ms inside the same metro. Apply that delta to each HTTPS
round-trip in the numbers above:
- Hot
run_code(): 38 ms on wifi → ~15 ms on fiber → sub-10 ms in-cloud
- Full round-trip: 226 ms on wifi → ~100 ms on fiber → ~30 ms in-cloud
The server-side compute numbers (fork 24 ms, pool hit 6 ms, pool
refill 12 ms) don’t change — those are what hostd does regardless of
where the client sits. Network is the variable.