Skip to main content
We benchmarked Podflare against E2B, Daytona, and Blaxel end-to-end — same SDK machine, same wall-clock window, same workload — so the numbers below are directly comparable. No selective sampling, no apples-to-oranges network paths. Reproduce with scripts/bench-cold-start.py and scripts/bench-http-outbound.py from our repo.

TL;DR — 30-iteration distribution

The 5-iteration version of this bench gave misleading results (notably, my early E2B numbers were thrown off by sample noise). All numbers below are from 30 sequential cold-start iterations per platform, run on the same machine in the same window.
metricPodflare (SDK ≥ 0.0.17)E2BBlaxelDaytona
p50 (typical case)260 ms442 ms627 ms663 ms
p95 (1-in-20 worst)539 ms811 ms2,665 ms1,666 ms
p99 (1-in-100)814 ms4,135 ms3,719 ms7,770 ms
max (worst observed)862 ms5,460 ms4,096 ms10,063 ms
spread (p95 − p50)279 ms369 ms2,038 ms1,003 ms
errors (in 30 iter)0000
First exec inside sandbox (vsock vs in-VM HTTP)~46 ms~200 ms~225 ms~111 ms
HTTP outbound to GitHub /zen (median curl)89 ms25 ms85 ms93 ms
HTTP outbound to Cloudflare trace25 ms38 ms82 ms21 ms
Sandbox isolationFirecrackerFirecrackerFirecrackerDocker + Sysbox
Open-source licenseproprietary; SDK MITApache-2.0proprietaryAGPL-3.0
Reproduce: scripts/bench-reliability.py. Each run takes 20–30 s wall-clock per platform, runs 30 sequential Sandbox.create() → exec("echo ready") → close() cycles, and reports the full distribution.

Cold-start distribution — head-to-head

The metric customers actually feel: time from Sandbox.create() until the first echo ready returns. Each row below is 30 sequential iterations per platform — enough samples that the tail percentiles mean something.
# Same script, four platforms, same machine.
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-reliability.py podflare
E2B_API_KEY=...               python3 scripts/bench-reliability.py e2b
DAYTONA_API_KEY=...           python3 scripts/bench-reliability.py daytona
BL_API_KEY=... BL_WORKSPACE=... \
                              python3 scripts/bench-reliability.py blaxel
Distribution (30 iterations per platform):
platformminp50p90p95p99maxmean
🥇 Podflare us-west (SDK ≥ 0.0.17)235 ms260 ms343 ms539 ms814 ms862 ms298 ms
🥈 E2B us-east380 ms442 ms682 ms811 ms4,135 ms5,460 ms652 ms
🥉 Blaxel us-pdx-1553 ms627 ms1,741 ms2,665 ms3,719 ms4,096 ms924 ms
Daytona440 ms663 ms1,023 ms1,666 ms7,770 ms10,063 ms1,051 ms
Two findings worth highlighting:
  1. Podflare wins every percentile. p50 is 1.7× faster than E2B (the next-best), and the tail is bounded — p99 is 814 ms versus E2B’s 4,135 ms (5×) and Daytona’s 7,770 ms (10×). The previous version of this bench, with SDK 0.0.16, showed a 3,050 ms p95 for Podflare driven by occasional dropped TCP SYNs on the public internet. SDK 0.0.17 caps the connect timeout at 800 ms and retries on a fresh socket, capping that pathological tail.
  2. All four platforms had 0 errors across 30 iterations. The differentiator is latency distribution, not reliability in the traditional uptime sense.

Pick the metric that matters to your workload

Different production agents care about different parts of the distribution. The “best” platform depends on which tail kills you faster.
your prioritywinnerwhy
Fastest typical request (p50 — agent loop responsiveness)🏆 Podflare 260 ms1.7× faster than E2B, 2.5× faster than Daytona
Predictable latency (p95−p50 spread, “no surprises”)🏆 Podflare 279 msSDK 0.0.17’s SYN-drop retry caps the tail
Bounded worst case (max — circuit-breaker thresholds)🏆 Podflare 862 ms5× tighter than E2B, 12× tighter than Daytona
Lowest mean (cost-driven, billing-by-time workloads)🏆 Podflare 298 msDistribution shape: tight head, tight tail

Why Podflare’s first_exec is 2–48× faster

The first exec() after create() looks identical from the SDK side, but the underlying paths are different:
platformfirst_exec stack
Podflarehostd ↔ Firecracker vsock UDS ↔ in-VM agent (binary protocol over UNIX socket)
Daytonaproxy → runner → daemon HTTP gin server (TCP + TLS over container bridge)
BlaxelSDK → orchestrator → in-VM agent (HTTP + TLS)
E2Bclient-proxy → orchestrator → ConnectRPC HTTP/1.1+H2 to envd in-VM (TCP + TLS)
All three competitors run a normal HTTP server inside each sandbox. Convenient (the SDK can talk plain HTTP), but every exec pays for TCP + TLS + HTTP framing inside the guest. We use vsock — a direct host↔guest socket with no TCP overhead — and a binary line protocol. Round-trip from a hot connection is ~3 ms server-side; the rest is network between you and the region.

HTTP outbound — what your agent actually feels

Inside-the-sandbox curl against two reliable targets, 5 runs each. (Earlier benches included httpbin.org but kept getting 10-second outliers from all four platforms — that’s httpbin’s per-source-IP rate limit, not platform speed.)
targetPodflare us-westDaytonaBlaxel us-pdx-1E2B us-east
Cloudflare trace25 ms21 ms82 ms38 ms
GitHub API /zen89 ms93 ms85 ms25 ms
Two takeaways:
  • GitHub-flavored workloads (the long tail of agent traffic — pip install, npm install, GitHub API, Hugging Face, etc.) all land within ~70 ms of each other. Geography is the whole story; pick a region close to GitHub’s Azure us-east peering.
  • Cloudflare trace shows raw network speed. Daytona’s 21 ms is fastest because their colo happens to be one hop from Cloudflare’s Ashburn PoP. Differences here are small absolute numbers.

Out-of-the-box experience

platformbase imagecurl pre-installedapk add curl time
PodflareUbuntu 24.04 (full)n/a
DaytonaUbuntu (full)n/a
E2BUbuntu/Debian (full)n/a
BlaxelAlpine 3.23 (157 binaries total)5 seconds
Blaxel’s bare-Alpine choice is interesting: smaller image, faster boot, smaller attack surface. But every workload needs apk add for basics like curl, python, git before doing real work — every fresh sandbox pays for that bootstrap.

Forking and persistent state

Cold-start isn’t the whole story. AI-agent workloads also fork (try N branches in parallel) and persist (resume a working session later). This is where the gap widens.
capabilityPodflareDaytonaBlaxelE2B
fork(n=5) from a running sandbox~80 ms p50not exposednot exposednot exposed
Persistent state across destroySpaces — full VM memory survivescontainer archive (storage only)pause / resumesnapshot via docker commit
Diff snapshots (only dirty pages)yes, ~24 msnonono
Multi-region edge router5 regions, haversine + failoversingle region per cluster3 regions, manual pinsingle region per cluster
fork() is the genuinely differentiated primitive. Most LLM-agent patterns (tree-of-thought, multi-attempt code synthesis) want N children that all start from the parent’s exact mid-flight state. On container platforms you’d docker commit (~seconds) and docker run N (~seconds × N). On Podflare that’s parent.fork(n=5) — a copy-on-write diff snapshot + N parallel microVM spawns in 80 ms p50, total. See Performance for the breakdown of what fork() does in those 80 ms.

Architecture comparison

PodflareDaytonaBlaxelE2B
IsolationFirecracker microVM (KVM hypervisor)Docker + Sysbox (kernel-shared)Firecracker microVMFirecracker microVM (KVM)
Bare-metal hostingHetzner + Latitude (5 regions)unspecified cloud / k8sunspecifiedGCP + AWS bare-metal (single region)
Cold-start magicSnapshot restore + warm pool + xfs reflink CoWPre-booted VMs in DB, atomic orgId-flip handoffMinimal Alpine + ?UFFD lazy mem + memory-prefetch + NBD rootfs
Warm pool primitivepop_front() from a VecDeque of running VMsDB UPDATE — flip orgId on a sentinel-org pre-booted sandboxnot documentedSnapshot resume per request
In-sandbox exec channelvsock binary protocolgin HTTP over TCP+TLS (port 2280)HTTP over TCP+TLSConnectRPC over TCP+TLS (port 49983)
Edge routerCloudflare Worker, haversine routing, 5 regionsSingle regional proxyManual region pinAPI gateway (single region)
Failover on origin 5xxyes (next-nearest region)no documentedno documentedno documented
fork() primitiveyes, ~80 msnonono
Persistent state across destroyyes (Spaces, full memory)container archivepause/resumecontainer commit

License

licenseimplication
Podflareproprietary platform; SDKs MITuse commercially without restrictions
Blaxelproprietarynot self-hostable
E2BApache-2.0 (entire stack)self-hostable; no viral terms
DaytonaAGPL-3.0 (entire stack)self-hostable, but modifications must be open-sourced if you run as a commercial service
If you’re building on top of one of these and might fork it later, license matters. Daytona’s AGPL is genuinely restrictive for commercial use; E2B’s Apache-2.0 is permissive; Blaxel and Podflare are proprietary.

Free-tier limits

Podflare freeE2B HobbyDaytona Tier 1Blaxel free
RAM per sandbox1 GB8 GB8 GB (4 vCPU)varies
Max concurrent1020dynamic (pool-shared 20 GB)per workspace
Max session lifetime30 min1 hournot statednot stated
Idle timeout5 minnot statednot statednot stated
Starter creditnone$100$200not stated
Honest comparison: Podflare’s free tier is more conservative on per-sandbox limits. Tradeoff: lower abuse risk (1 GB ceiling makes crypto mining unprofitable without separate detection), at the cost of less headroom for free-tier experimentation. Pro tier opens up to 4 GB per sandbox, 50 concurrent, 8-hour lifetime.

Production-choice ranking — by axis (30-iter)

axis1st2nd3rd4th
p50 (typical request)Podflare 260 msE2B 442 msBlaxel 627 msDaytona 663 ms
p95 (1-in-20)Podflare 539 msE2B 811 msDaytona 1,666 msBlaxel 2,665 ms
p99 (1-in-100)Podflare 814 msBlaxel 3,719 msE2B 4,135 msDaytona 7,770 ms
max (worst observed)Podflare 862 msBlaxel 4,096 msE2B 5,460 msDaytona 10,063 ms
spread (p95 − p50)Podflare 279 msE2B 369 msDaytona 1,003 msBlaxel 2,038 ms
HTTP outbound to GitHubE2B 25 msBlaxel 85 msPodflare 89 msDaytona 93 ms
Persistency primitivesPodflare (full VM memory survives)E2B / Blaxel (filesystem)Daytona (archive)
Unique featuresPodflare (fork(), multi-region, Spaces)E2B (Apache 2.0)Daytona (AGPL self-host)Blaxel (minimal Alpine)

When each one wins

  • Latency-sensitive interactive agents (default case)Podflare. Wins p50 (260 ms), p95 (539 ms), p99 (814 ms), and max (862 ms) — the only platform under 1 s at every percentile. Native fork() for tree-of-thought patterns. Persistent Spaces survive full VM memory across restarts. Requires SDK ≥ 0.0.17 — older versions have a 3 s p95 tail caused by TCP SYN-retry on the public internet.
  • GitHub-heavy workloads where outbound to Azure us-east mattersE2B. Their us-east colo wins HTTP outbound to GitHub at 25 ms (vs ours/others at 85–93 ms). Apache-2.0 lets you self-host for compliance.
  • Self-hosted on existing Docker/k8s infrastructureDaytona. Pay the AGPL toll only if you’ll never fork the runtime; the Docker/Sysbox isolation is meaningfully weaker than Firecracker if your threat model includes adversarial guest code.
  • Minimum-image, minimum-RAM workloads with your own bootstrapBlaxel. Alpine base + 627 ms p50 is fine if your workload pre-warms with its own deps. Smallest attack surface.

Reproduce these numbers

All bench scripts are in our repo. They take an SDK API key for each platform and run identical workloads.
git clone https://github.com/PodFlare-ai/podflare
cd podflare

# 30-iter reliability bench — the headline numbers above
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-reliability.py podflare
E2B_API_KEY=...               python3 scripts/bench-reliability.py e2b
DAYTONA_API_KEY=...           python3 scripts/bench-reliability.py daytona
BL_API_KEY=... BL_WORKSPACE=... BL_REGION=us-pdx-1 \
                              python3 scripts/bench-reliability.py blaxel

# Quick 5-iter cold-start (what most casual benches show)
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-cold-start.py podflare
# ... same arguments for e2b/daytona/blaxel

# HTTP outbound from inside the sandbox
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-http-outbound.py podflare
# ... same arguments for e2b/daytona/blaxel
Each bench-reliability.py run does 30 sequential Sandbox.create() → exec("echo ready") → close() cycles per platform and prints the full distribution (min / p50 / p90 / p95 / p99 / max / mean / spread). No special flags, no warmup-and-discard tricks. If your numbers differ meaningfully from ours, send us the bench output and the SDK version you ran — we treat regression reports as P0. Our job is for these numbers to stay honest, not for our marketing to claim things the bench can’t reproduce.

Methodology

  • Date: April 2026
  • Client: macOS laptop on residential wifi, west-coast US
  • Podflare endpoint: api.podflare.ai with SDK 0.0.16 (client-side haversine — auto-routes to us-west from this machine)
  • E2B endpoint: e2b_code_interpreter SDK default (single GCP/AWS region, likely us-east4)
  • Daytona endpoint: daytona SDK default (single region per account; ours landed near IAD)
  • Blaxel endpoint: blaxel SDK with BL_REGION=us-pdx-1 and image=blaxel/base-image:latest
  • Sandbox spec: each platform’s default — 1 GB / 1 vCPU on all four
  • Bench iterations: 30 cold starts per platform, sequential, no parallelism. Each iteration is a complete Sandbox.create() → exec("echo ready") → close()/kill()/delete() cycle. We report the full distribution because the 5-iteration version of this bench gave misleadingly noisy results — particularly for E2B, whose median moved from 2,504 ms (5 samples) to 442 ms (30 samples). Sample size matters.
Run-to-run variance: the p50 numbers shift by ~30–50 ms either way between sessions. p95 / p99 are noisier — depend on whatever upstream network jitter happens that minute. The rank order on p50 and spread is stable across multiple bench sessions: Podflare wins p50, E2B wins spread.