Skip to main content
We benchmarked Podflare against E2B, Daytona, and Blaxel end-to-end — same SDK machine, same wall-clock window, same workload — so the numbers below are directly comparable. No selective sampling, no apples-to-oranges network paths. Reproduce with scripts/bench-cold-start.py and scripts/bench-http-outbound.py from our repo.

TL;DR — 30-iteration distribution

The 5-iteration version of this bench gave misleading results (notably, my early E2B numbers were thrown off by sample noise). All numbers below are from 30 sequential cold-start iterations per platform, run on the same machine in the same window.
metricPodflare (SDK ≥ 0.0.20)E2BBlaxelDaytona
p50 (typical case)153 ms442 ms627 ms663 ms
p95 (1-in-20 worst)170 ms811 ms2,665 ms1,666 ms
p99 (1-in-100)236 ms4,135 ms3,719 ms7,770 ms
max (worst observed)263 ms5,460 ms4,096 ms10,063 ms
spread (p95 − p50)17 ms369 ms2,038 ms1,003 ms
errors (in 30 iter)0000
First exec inside sandbox (vsock vs in-VM HTTP)~46 ms~200 ms~225 ms~111 ms
HTTP outbound to GitHub /zen (median curl)89 ms25 ms85 ms93 ms
HTTP outbound to Cloudflare trace25 ms38 ms82 ms21 ms
Sandbox isolationPodflare Pod microVMFirecracker microVMFirecracker microVMDocker + Sysbox
Open-source licenseproprietary; SDK MITApache-2.0proprietaryAGPL-3.0
Reproduce: scripts/bench-reliability.py. Each run takes 20–30 s wall-clock per platform, runs 30 sequential Sandbox.create() → exec("echo ready") → close() cycles, and reports the full distribution.

Cold-start distribution — head-to-head

The metric customers actually feel: time from Sandbox.create() until the first echo ready returns. Each row below is 30 sequential iterations per platform — enough samples that the tail percentiles mean something.
# Same script, four platforms, same machine.
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-reliability.py podflare
E2B_API_KEY=...               python3 scripts/bench-reliability.py e2b
DAYTONA_API_KEY=...           python3 scripts/bench-reliability.py daytona
BL_API_KEY=... BL_WORKSPACE=... \
                              python3 scripts/bench-reliability.py blaxel
Distribution (30 iterations per platform):
platformminp50p90p95p99maxmean
🥇 Podflare api.podflare.ai (SDK ≥ 0.0.20)143 ms153 ms165 ms170 ms236 ms263 ms173 ms
🥈 E2B us-east418 ms467 ms720 ms750 ms852 ms888 ms509 ms
🥉 Blaxel us-pdx-1553 ms627 ms1,741 ms2,665 ms3,719 ms4,096 ms924 ms
Daytona439 ms713 ms1,120 ms1,130 ms1,136 ms1,137 ms722 ms
Two findings worth highlighting:
  1. Podflare wins every percentile. p50 is 3.0× faster than E2B (the next-best), p95 is 4.4×, p99 is 3.6×, max is bounded under 270 ms while E2B’s max is 888 ms and Blaxel’s is 4 s. The previous version of this bench, with SDK 0.0.17, showed a 1,741 ms p99 for Podflare — caused by the SDK’s own connect=0.8s + retries=2 compounding on slow TLS handshakes (three chained ConnectTimeouts add up to ~2.9 s). SDK 0.0.19 widened connect to 2.5 s and dropped retries to 1, killing that self-inflicted tail. SDK 0.0.20 then defaulted to api.podflare.ai (Cloudflare-edge-routed), which is faster than direct-to-origin from most residential callers because the CF edge PoP is closer than any single region.
  2. All four platforms had 0 errors across 30 iterations. The differentiator is latency distribution, not reliability in the traditional uptime sense.

Pick the metric that matters to your workload

Different production agents care about different parts of the distribution. The “best” platform depends on which tail kills you faster.
your prioritywinnerwhy
Fastest typical request (p50 — agent loop responsiveness)🏆 Podflare 153 ms3.0× faster than E2B, 4.7× faster than Daytona
Predictable latency (p95−p50 spread, “no surprises”)🏆 Podflare 17 msSDK 0.0.19’s tuned connect timeout caps the tail
Bounded worst case (max — circuit-breaker thresholds)🏆 Podflare 263 ms3.4× tighter than E2B, 4.3× tighter than Daytona
Lowest mean (cost-driven, billing-by-time workloads)🏆 Podflare 173 msDistribution shape: tight head, tight tail

Why Podflare’s first_exec is 2–48× faster

The first exec() after create() looks identical from the SDK side, but the underlying paths are different:
platformfirst_exec stack
Podflarehostd ↔ Pod vsock UDS ↔ in-VM agent (binary protocol over UNIX socket)
Daytonaproxy → runner → daemon HTTP gin server (TCP + TLS over container bridge)
BlaxelSDK → orchestrator → in-VM agent (HTTP + TLS)
E2Bclient-proxy → orchestrator → ConnectRPC HTTP/1.1+H2 to envd in-VM (TCP + TLS)
All three competitors run a normal HTTP server inside each sandbox. Convenient (the SDK can talk plain HTTP), but every exec pays for TCP + TLS + HTTP framing inside the guest. We use vsock — a direct host↔guest socket with no TCP overhead — and a binary line protocol. Round-trip from a hot connection is ~3 ms server-side; the rest is network between you and the region.

HTTP outbound — what your agent actually feels

Inside-the-sandbox curl against two reliable targets, 5 runs each. (Earlier benches included httpbin.org but kept getting 10-second outliers from all four platforms — that’s httpbin’s per-source-IP rate limit, not platform speed.)
targetPodflare us-westDaytonaBlaxel us-pdx-1E2B us-east
Cloudflare trace25 ms21 ms82 ms38 ms
GitHub API /zen89 ms93 ms85 ms25 ms
Two takeaways:
  • GitHub-flavored workloads (the long tail of agent traffic — pip install, npm install, GitHub API, Hugging Face, etc.) all land within ~70 ms of each other. Geography is the whole story; pick a region close to GitHub’s Azure us-east peering.
  • Cloudflare trace shows raw network speed. Daytona’s 21 ms is fastest because their colo happens to be one hop from Cloudflare’s Ashburn PoP. Differences here are small absolute numbers.

Out-of-the-box experience

platformbase imagecurl pre-installedapk add curl time
PodflareUbuntu 24.04 (full)n/a
DaytonaUbuntu (full)n/a
E2BUbuntu/Debian (full)n/a
BlaxelAlpine 3.23 (157 binaries total)5 seconds
Blaxel’s bare-Alpine choice is interesting: smaller image, faster boot, smaller attack surface. But every workload needs apk add for basics like curl, python, git before doing real work — every fresh sandbox pays for that bootstrap.

Forking and persistent state

Cold-start isn’t the whole story. AI-agent workloads also fork (try N branches in parallel) and persist (resume a working session later). This is where the gap widens.
capabilityPodflareDaytonaBlaxelE2B
fork(n=5) from a running sandbox~80 ms p50not exposednot exposednot exposed
Persistent state across destroySpaces — full VM memory survivescontainer archive (storage only)pause / resumesnapshot via docker commit
Diff snapshots (only dirty pages)yes, ~24 msnonono
Multi-region edge router5 regions, haversine + failoversingle region per cluster3 regions, manual pinsingle region per cluster
fork() is the genuinely differentiated primitive. Most LLM-agent patterns (tree-of-thought, multi-attempt code synthesis) want N children that all start from the parent’s exact mid-flight state. On container platforms you’d docker commit (~seconds) and docker run N (~seconds × N). On Podflare that’s parent.fork(n=5) — a copy-on-write diff snapshot + N parallel microVM spawns in 80 ms p50, total. See Performance for the breakdown of what fork() does in those 80 ms.

Architecture comparison

PodflareDaytonaBlaxelE2B
IsolationPodflare Pod microVM (KVM hypervisor)Docker + Sysbox (kernel-shared)Podflare Pod microVMPodflare Pod microVM (KVM)
Bare-metal hostingHetzner + Latitude (5 regions)unspecified cloud / k8sunspecifiedGCP + AWS bare-metal (single region)
Cold-start magicSnapshot restore + warm pool + xfs reflink CoWPre-booted VMs in DB, atomic orgId-flip handoffMinimal Alpine + ?UFFD lazy mem + memory-prefetch + NBD rootfs
Warm pool primitivepop_front() from a VecDeque of running VMsDB UPDATE — flip orgId on a sentinel-org pre-booted sandboxnot documentedSnapshot resume per request
In-sandbox exec channelvsock binary protocolgin HTTP over TCP+TLS (port 2280)HTTP over TCP+TLSConnectRPC over TCP+TLS (port 49983)
Edge routerCloudflare Worker, haversine routing, 5 regionsSingle regional proxyManual region pinAPI gateway (single region)
Failover on origin 5xxyes (next-nearest region)no documentedno documentedno documented
fork() primitiveyes, ~80 msnonono
Persistent state across destroyyes (Spaces, full memory)container archivepause/resumecontainer commit

License

licenseimplication
Podflareproprietary platform; SDKs MITuse commercially without restrictions
Blaxelproprietarynot self-hostable
E2BApache-2.0 (entire stack)self-hostable; no viral terms
DaytonaAGPL-3.0 (entire stack)self-hostable, but modifications must be open-sourced if you run as a commercial service
If you’re building on top of one of these and might fork it later, license matters. Daytona’s AGPL is genuinely restrictive for commercial use; E2B’s Apache-2.0 is permissive; Blaxel and Podflare are proprietary.

Free-tier limits

Podflare freeE2B HobbyDaytona Tier 1Blaxel free
RAM per sandbox1 GB8 GB8 GB (4 vCPU)varies
Max concurrent1020dynamic (pool-shared 20 GB)per workspace
Max session lifetime30 min1 hournot statednot stated
Idle timeout5 minnot statednot statednot stated
Starter creditnone$100$200not stated
Honest comparison: Podflare’s free tier is more conservative on per-sandbox limits. Tradeoff: lower abuse risk (1 GB ceiling makes crypto mining unprofitable without separate detection), at the cost of less headroom for free-tier experimentation. Pro tier opens up to 4 GB per sandbox, 50 concurrent, 8-hour lifetime.

Production-choice ranking — by axis (30-iter)

axis1st2nd3rd4th
p50 (typical request)Podflare 153 msE2B 467 msBlaxel 627 msDaytona 713 ms
p95 (1-in-20)Podflare 170 msE2B 750 msDaytona 1,130 msBlaxel 2,665 ms
p99 (1-in-100)Podflare 236 msE2B 852 msDaytona 1,136 msBlaxel 3,719 ms
max (worst observed)Podflare 263 msE2B 888 msDaytona 1,137 msBlaxel 4,096 ms
spread (p95 − p50)Podflare 17 msE2B 283 msDaytona 417 msBlaxel 2,038 ms
HTTP outbound to GitHubE2B 25 msBlaxel 85 msPodflare 89 msDaytona 93 ms
Persistency primitivesPodflare (full VM memory survives)E2B / Blaxel (filesystem)Daytona (archive)
Unique featuresPodflare (fork(), multi-region, Spaces)E2B (Apache 2.0)Daytona (AGPL self-host)Blaxel (minimal Alpine)

When each one wins

  • Latency-sensitive interactive agents (default case)Podflare. Wins p50 (153 ms), p95 (170 ms), p99 (236 ms), and max (263 ms) — the only platform under 300 ms at every percentile. Native fork() for tree-of-thought patterns. Persistent Spaces survive full VM memory across restarts. Requires SDK ≥ 0.0.20 — 0.0.17 through 0.0.18 have a ~1.7 s p99 tail caused by the SDK’s own tight-connect + retries=2 compounding (fixed in 0.0.19); 0.0.20 then defaulted to api.podflare.ai for edge-routed latency.
  • GitHub-heavy workloads where outbound to Azure us-east mattersE2B. Their us-east colo wins HTTP outbound to GitHub at 25 ms (vs ours/others at 85–93 ms). Apache-2.0 lets you self-host for compliance.
  • Self-hosted on existing Docker/k8s infrastructureDaytona. Pay the AGPL toll only if you’ll never fork the runtime; the Docker/Sysbox isolation is meaningfully weaker than a microVM if your threat model includes adversarial guest code.
  • Minimum-image, minimum-RAM workloads with your own bootstrapBlaxel. Alpine base + 627 ms p50 is fine if your workload pre-warms with its own deps. Smallest attack surface.

Reproduce these numbers

All bench scripts are in our repo. They take an SDK API key for each platform and run identical workloads.
git clone https://github.com/PodFlare-ai/podflare
cd podflare

# 30-iter reliability bench — the headline numbers above
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-reliability.py podflare
E2B_API_KEY=...               python3 scripts/bench-reliability.py e2b
DAYTONA_API_KEY=...           python3 scripts/bench-reliability.py daytona
BL_API_KEY=... BL_WORKSPACE=... BL_REGION=us-pdx-1 \
                              python3 scripts/bench-reliability.py blaxel

# Quick 5-iter cold-start (what most casual benches show)
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-cold-start.py podflare
# ... same arguments for e2b/daytona/blaxel

# HTTP outbound from inside the sandbox
PODFLARE_API_KEY=pf_live_... python3 scripts/bench-http-outbound.py podflare
# ... same arguments for e2b/daytona/blaxel
Each bench-reliability.py run does 30 sequential Sandbox.create() → exec("echo ready") → close() cycles per platform and prints the full distribution (min / p50 / p90 / p95 / p99 / max / mean / spread). No special flags, no warmup-and-discard tricks. If your numbers differ meaningfully from ours, send us the bench output and the SDK version you ran — we treat regression reports as P0. Our job is for these numbers to stay honest, not for our marketing to claim things the bench can’t reproduce.

Methodology

  • Date: April 2026
  • Client: macOS laptop on residential wifi, west-coast US
  • Podflare endpoint: api.podflare.ai with SDK 0.0.20 (Cloudflare-edge-routed — haversine-picks the nearest origin server-side per-request; from this machine that’s us-west)
  • E2B endpoint: e2b_code_interpreter SDK default (single GCP/AWS region, likely us-east4)
  • Daytona endpoint: daytona SDK default (single region per account; ours landed near IAD)
  • Blaxel endpoint: blaxel SDK with BL_REGION=us-pdx-1 and image=blaxel/base-image:latest
  • Sandbox spec: each platform’s default — 1 GB / 1 vCPU on all four
  • Bench iterations: 30 cold starts per platform, sequential, no parallelism. Each iteration is a complete Sandbox.create() → exec("echo ready") → close()/kill()/delete() cycle. We report the full distribution because the 5-iteration version of this bench gave misleadingly noisy results — particularly for E2B, whose median moved from 2,504 ms (5 samples) to 442 ms (30 samples). Sample size matters.
Run-to-run variance: the p50 numbers shift by ~10–30 ms either way between sessions. p95 / p99 are noisier — depend on whatever upstream network jitter happens that minute. The rank order on every percentile is stable across multiple bench sessions: Podflare wins p50 through max.