vs E2B, Daytona, and Blaxel

We benchmarked Podflare against E2B, Daytona, and Blaxel end-to-end — same SDK machine, same wall-clock window, same workload — so the numbers below are directly comparable. No selective sampling, no apples-to-oranges network paths. Reproduce with scripts/bench-cold-start.py and scripts/bench-http-outbound.py from our repo.

TL;DR — 30-iteration distribution

The 5-iteration version of this bench gave misleading results (notably, my early E2B numbers were thrown off by sample noise). All numbers below are from 30 sequential cold-start iterations per platform, run on the same machine in the same window.

metric	Podflare (SDK ≥ 0.0.20)	E2B	Blaxel	Daytona
p50 (typical case)	153 ms	442 ms	627 ms	663 ms
p95 (1-in-20 worst)	170 ms	811 ms	2,665 ms	1,666 ms
p99 (1-in-100)	236 ms	4,135 ms	3,719 ms	7,770 ms
max (worst observed)	263 ms	5,460 ms	4,096 ms	10,063 ms
spread (p95 − p50)	17 ms	369 ms	2,038 ms	1,003 ms
errors (in 30 iter)	0	0	0	0
First exec inside sandbox (vsock vs in-VM HTTP)	~46 ms	~200 ms	~225 ms	~111 ms
HTTP outbound to GitHub `/zen` (median curl)	89 ms	25 ms	85 ms	93 ms
HTTP outbound to Cloudflare trace	25 ms	38 ms	82 ms	21 ms
Sandbox isolation	Podflare Pod microVM	Firecracker microVM	Firecracker microVM	Docker + Sysbox
Open-source license	proprietary; SDK MIT	Apache-2.0	proprietary	AGPL-3.0

Reproduce: scripts/bench-reliability.py. Each run takes 20–30 s wall-clock per platform, runs 30 sequential Sandbox.create() → exec("echo ready") → close() cycles, and reports the full distribution.

Cold-start distribution — head-to-head

The metric customers actually feel: time from Sandbox.create() until the first echo ready returns. Each row below is 30 sequential iterations per platform — enough samples that the tail percentiles mean something.

# Same script, four platforms, same machine.
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-reliability.py podflare
E2B_API_KEY=...               python3 scripts/bench-reliability.py e2b
DAYTONA_API_KEY=...           python3 scripts/bench-reliability.py daytona
BL_API_KEY=... BL_WORKSPACE=... \
                              python3 scripts/bench-reliability.py blaxel

Distribution (30 iterations per platform):

platform	min	p50	p90	p95	p99	max	mean
🥇 Podflare `api.podflare.ai` (SDK ≥ 0.0.20)	143 ms	153 ms	165 ms	170 ms	236 ms	263 ms	173 ms
🥈 E2B us-east	418 ms	467 ms	720 ms	750 ms	852 ms	888 ms	509 ms
🥉 Blaxel us-pdx-1	553 ms	627 ms	1,741 ms	2,665 ms	3,719 ms	4,096 ms	924 ms
Daytona	439 ms	713 ms	1,120 ms	1,130 ms	1,136 ms	1,137 ms	722 ms

Two findings worth highlighting:

Podflare wins every percentile. p50 is 3.0× faster than E2B (the next-best), p95 is 4.4×, p99 is 3.6×, max is bounded under 270 ms while E2B’s max is 888 ms and Blaxel’s is 4 s. The previous version of this bench, with SDK 0.0.17, showed a 1,741 ms p99 for Podflare — caused by the SDK’s own connect=0.8s + retries=2 compounding on slow TLS handshakes (three chained ConnectTimeouts add up to ~2.9 s). SDK 0.0.19 widened connect to 2.5 s and dropped retries to 1, killing that self-inflicted tail. SDK 0.0.20 then defaulted to api.podflare.ai (Cloudflare-edge-routed), which is faster than direct-to-origin from most residential callers because the CF edge PoP is closer than any single region.
All four platforms had 0 errors across 30 iterations. The differentiator is latency distribution, not reliability in the traditional uptime sense.

Pick the metric that matters to your workload

Different production agents care about different parts of the distribution. The “best” platform depends on which tail kills you faster.

your priority	winner	why
Fastest typical request (p50 — agent loop responsiveness)	🏆 Podflare 153 ms	3.0× faster than E2B, 4.7× faster than Daytona
Predictable latency (p95−p50 spread, “no surprises”)	🏆 Podflare 17 ms	SDK 0.0.19’s tuned connect timeout caps the tail
Bounded worst case (max — circuit-breaker thresholds)	🏆 Podflare 263 ms	3.4× tighter than E2B, 4.3× tighter than Daytona
Lowest mean (cost-driven, billing-by-time workloads)	🏆 Podflare 173 ms	Distribution shape: tight head, tight tail

Why Podflare’s first_exec is 2–48× faster

The first exec() after create() looks identical from the SDK side, but the underlying paths are different:

platform	first_exec stack
Podflare	hostd ↔ Pod vsock UDS ↔ in-VM agent (binary protocol over UNIX socket)
Daytona	proxy → runner → daemon HTTP gin server (TCP + TLS over container bridge)
Blaxel	SDK → orchestrator → in-VM agent (HTTP + TLS)
E2B	client-proxy → orchestrator → ConnectRPC HTTP/1.1+H2 to envd in-VM (TCP + TLS)

All three competitors run a normal HTTP server inside each sandbox. Convenient (the SDK can talk plain HTTP), but every exec pays for TCP + TLS + HTTP framing inside the guest. We use vsock — a direct host↔guest socket with no TCP overhead — and a binary line protocol. Round-trip from a hot connection is ~3 ms server-side; the rest is network between you and the region.

HTTP outbound — what your agent actually feels

Inside-the-sandbox curl against two reliable targets, 5 runs each. (Earlier benches included httpbin.org but kept getting 10-second outliers from all four platforms — that’s httpbin’s per-source-IP rate limit, not platform speed.)

target	Podflare us-west	Daytona	Blaxel us-pdx-1	E2B us-east
Cloudflare trace	25 ms	21 ms	82 ms	38 ms
GitHub API `/zen`	89 ms	93 ms	85 ms	25 ms

Two takeaways:

GitHub-flavored workloads (the long tail of agent traffic — pip install, npm install, GitHub API, Hugging Face, etc.) all land within ~70 ms of each other. Geography is the whole story; pick a region close to GitHub’s Azure us-east peering.
Cloudflare trace shows raw network speed. Daytona’s 21 ms is fastest because their colo happens to be one hop from Cloudflare’s Ashburn PoP. Differences here are small absolute numbers.

Out-of-the-box experience

platform	base image	curl pre-installed	`apk add curl` time
Podflare	Ubuntu 24.04 (full)	✓	n/a
Daytona	Ubuntu (full)	✓	n/a
E2B	Ubuntu/Debian (full)	✓	n/a
Blaxel	Alpine 3.23 (157 binaries total)	✗	5 seconds

Blaxel’s bare-Alpine choice is interesting: smaller image, faster boot, smaller attack surface. But every workload needs apk add for basics like curl, python, git before doing real work — every fresh sandbox pays for that bootstrap.

Forking and persistent state

Cold-start isn’t the whole story. AI-agent workloads also fork (try N branches in parallel) and persist (resume a working session later). This is where the gap widens.

capability	Podflare	Daytona	Blaxel	E2B
`fork(n=5)` from a running sandbox	~80 ms p50	not exposed	not exposed	not exposed
Persistent state across destroy	Spaces — full VM memory survives	container archive (storage only)	pause / resume	snapshot via `docker commit`
Diff snapshots (only dirty pages)	yes, ~24 ms	no	no	no
Multi-region edge router	5 regions, haversine + failover	single region per cluster	3 regions, manual pin	single region per cluster

fork() is the genuinely differentiated primitive. Most LLM-agent patterns (tree-of-thought, multi-attempt code synthesis) want N children that all start from the parent’s exact mid-flight state. On container platforms you’d docker commit (~seconds) and docker run N (~seconds × N). On Podflare that’s parent.fork(n=5) — a copy-on-write diff snapshot + N parallel microVM spawns in 80 ms p50, total. See Performance for the breakdown of what fork() does in those 80 ms.

Architecture comparison

	Podflare	Daytona	Blaxel	E2B
Isolation	Podflare Pod microVM (KVM hypervisor)	Docker + Sysbox (kernel-shared)	Podflare Pod microVM	Podflare Pod microVM (KVM)
Bare-metal hosting	Hetzner + Latitude (5 regions)	unspecified cloud / k8s	unspecified	GCP + AWS bare-metal (single region)
Cold-start magic	Snapshot restore + warm pool + xfs reflink CoW	Pre-booted VMs in DB, atomic orgId-flip handoff	Minimal Alpine + ?	UFFD lazy mem + memory-prefetch + NBD rootfs
Warm pool primitive	`pop_front()` from a `VecDeque` of running VMs	DB UPDATE — flip orgId on a sentinel-org pre-booted sandbox	not documented	Snapshot resume per request
In-sandbox exec channel	vsock binary protocol	gin HTTP over TCP+TLS (port 2280)	HTTP over TCP+TLS	ConnectRPC over TCP+TLS (port 49983)
Edge router	Cloudflare Worker, haversine routing, 5 regions	Single regional proxy	Manual region pin	API gateway (single region)
Failover on origin 5xx	yes (next-nearest region)	no documented	no documented	no documented
`fork()` primitive	yes, ~80 ms	no	no	no
Persistent state across destroy	yes (Spaces, full memory)	container archive	pause/resume	container commit

License

	license	implication
Podflare	proprietary platform; SDKs MIT	use commercially without restrictions
Blaxel	proprietary	not self-hostable
E2B	Apache-2.0 (entire stack)	self-hostable; no viral terms
Daytona	AGPL-3.0 (entire stack)	self-hostable, but modifications must be open-sourced if you run as a commercial service

If you’re building on top of one of these and might fork it later, license matters. Daytona’s AGPL is genuinely restrictive for commercial use; E2B’s Apache-2.0 is permissive; Blaxel and Podflare are proprietary.

Free-tier limits

	Podflare free	E2B Hobby	Daytona Tier 1	Blaxel free
RAM per sandbox	1 GB	8 GB	8 GB (4 vCPU)	varies
Max concurrent	10	20	dynamic (pool-shared 20 GB)	per workspace
Max session lifetime	30 min	1 hour	not stated	not stated
Idle timeout	5 min	not stated	not stated	not stated
Starter credit	none	$100	$200	not stated

Honest comparison: Podflare’s free tier is more conservative on per-sandbox limits. Tradeoff: lower abuse risk (1 GB ceiling makes crypto mining unprofitable without separate detection), at the cost of less headroom for free-tier experimentation. Pro tier opens up to 4 GB per sandbox, 50 concurrent, 8-hour lifetime.

Production-choice ranking — by axis (30-iter)

axis	1st	2nd	3rd	4th
p50 (typical request)	Podflare 153 ms	E2B 467 ms	Blaxel 627 ms	Daytona 713 ms
p95 (1-in-20)	Podflare 170 ms	E2B 750 ms	Daytona 1,130 ms	Blaxel 2,665 ms
p99 (1-in-100)	Podflare 236 ms	E2B 852 ms	Daytona 1,136 ms	Blaxel 3,719 ms
max (worst observed)	Podflare 263 ms	E2B 888 ms	Daytona 1,137 ms	Blaxel 4,096 ms
spread (p95 − p50)	Podflare 17 ms	E2B 283 ms	Daytona 417 ms	Blaxel 2,038 ms
HTTP outbound to GitHub	E2B 25 ms	Blaxel 85 ms	Podflare 89 ms	Daytona 93 ms
Persistency primitives	Podflare (full VM memory survives)	E2B / Blaxel (filesystem)	Daytona (archive)	—
Unique features	Podflare (`fork()`, multi-region, Spaces)	E2B (Apache 2.0)	Daytona (AGPL self-host)	Blaxel (minimal Alpine)

When each one wins

Latency-sensitive interactive agents (default case) → Podflare. Wins p50 (153 ms), p95 (170 ms), p99 (236 ms), and max (263 ms) — the only platform under 300 ms at every percentile. Native fork() for tree-of-thought patterns. Persistent Spaces survive full VM memory across restarts. Requires SDK ≥ 0.0.20 — 0.0.17 through 0.0.18 have a ~1.7 s p99 tail caused by the SDK’s own tight-connect + retries=2 compounding (fixed in 0.0.19); 0.0.20 then defaulted to api.podflare.ai for edge-routed latency.
GitHub-heavy workloads where outbound to Azure us-east matters → E2B. Their us-east colo wins HTTP outbound to GitHub at 25 ms (vs ours/others at 85–93 ms). Apache-2.0 lets you self-host for compliance.
Self-hosted on existing Docker/k8s infrastructure → Daytona. Pay the AGPL toll only if you’ll never fork the runtime; the Docker/Sysbox isolation is meaningfully weaker than a microVM if your threat model includes adversarial guest code.
Minimum-image, minimum-RAM workloads with your own bootstrap → Blaxel. Alpine base + 627 ms p50 is fine if your workload pre-warms with its own deps. Smallest attack surface.

Reproduce these numbers

All bench scripts are in our repo. They take an SDK API key for each platform and run identical workloads.

git clone https://github.com/PodFlare-ai/podflare
cd podflare

# 30-iter reliability bench — the headline numbers above
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-reliability.py podflare
E2B_API_KEY=...               python3 scripts/bench-reliability.py e2b
DAYTONA_API_KEY=...           python3 scripts/bench-reliability.py daytona
BL_API_KEY=... BL_WORKSPACE=... BL_REGION=us-pdx-1 \
                              python3 scripts/bench-reliability.py blaxel

# Quick 5-iter cold-start (what most casual benches show)
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-cold-start.py podflare
# ... same arguments for e2b/daytona/blaxel

# HTTP outbound from inside the sandbox
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-http-outbound.py podflare
# ... same arguments for e2b/daytona/blaxel

Each bench-reliability.py run does 30 sequential Sandbox.create() → exec("echo ready") → close() cycles per platform and prints the full distribution (min / p50 / p90 / p95 / p99 / max / mean / spread). No special flags, no warmup-and-discard tricks. If your numbers differ meaningfully from ours, send us the bench output and the SDK version you ran — we treat regression reports as P0. Our job is for these numbers to stay honest, not for our marketing to claim things the bench can’t reproduce.

Methodology

Date: April 2026
Client: macOS laptop on residential wifi, west-coast US
Podflare endpoint: api.podflare.ai with SDK 0.0.20 (Cloudflare-edge-routed — haversine-picks the nearest origin server-side per-request; from this machine that’s us-west)
E2B endpoint: e2b_code_interpreter SDK default (single GCP/AWS region, likely us-east4)
Daytona endpoint: daytona SDK default (single region per account; ours landed near IAD)
Blaxel endpoint: blaxel SDK with BL_REGION=us-pdx-1 and image=blaxel/base-image:latest
Sandbox spec: each platform’s default — 1 GB / 1 vCPU on all four
Bench iterations: 30 cold starts per platform, sequential, no parallelism. Each iteration is a complete Sandbox.create() → exec("echo ready") → close()/kill()/delete() cycle. We report the full distribution because the 5-iteration version of this bench gave misleadingly noisy results — particularly for E2B, whose median moved from 2,504 ms (5 samples) to 442 ms (30 samples). Sample size matters.

Run-to-run variance: the p50 numbers shift by ~10–30 ms either way between sessions. p95 / p99 are noisier — depend on whatever upstream network jitter happens that minute. The rank order on every percentile is stable across multiple bench sessions: Podflare wins p50 through max.

​TL;DR — 30-iteration distribution

​Cold-start distribution — head-to-head

​Pick the metric that matters to your workload

​Why Podflare’s first_exec is 2–48× faster

​HTTP outbound — what your agent actually feels

​Out-of-the-box experience

​Forking and persistent state

​Architecture comparison

​License

​Free-tier limits

​Production-choice ranking — by axis (30-iter)

​When each one wins

​Reproduce these numbers

​Methodology