Dev environments - Podflare

PR-review bots and “run my tests” agents all do the same thing under the hood: check out a branch, run a command, summarize what happened. The sandbox gives you the isolation that makes this safe to run on arbitrary customer repos.

Branch preview / PR review bot

from podflare import Sandbox

sbx = Sandbox(idle_timeout_seconds=1800)

r = sbx.run_code("""
git clone --depth 1 \
  --branch feat/new-billing \
  https://github.com/acme/app /tmp/app

cd /tmp/app
npm ci --silent
npm test 2>&1 | tail -40
""", language="bash")

print("exit:", r.exit_code)
print(r.stdout)
sbx.close()

Shell variant — auto-detect the stack

#!/usr/bin/env bash
set -euo pipefail

REPO=$1
BRANCH=$2

git clone --depth 1 --branch "$BRANCH" "$REPO" /tmp/repo
cd /tmp/repo

# Detect stack and run default tests
if [ -f package.json ]; then
  npm ci
  npm test
elif [ -f pyproject.toml ]; then
  pip install -e '.[dev]'
  pytest -q
elif [ -f Cargo.toml ]; then
  cargo test
fi

Why a sandbox (and not `child_process.exec` on your server)

Repos the agent clones contain arbitrary code. Their postinstall scripts, pytest conftests, Makefile targets — any of them can run whatever they want on the host. Running that inside a Podflare microVM means:

No access to your server’s filesystem.
No access to your server’s network beyond what you configure.
No long-running processes left behind — the microVM dies.
No accidental resource exhaustion — CPU/RAM are capped per sandbox.

Composing with the agent loop

The typical PR-review flow:

Agent is asked “does this PR pass tests?”
Agent calls Sandbox.create() (one sandbox per review).
Agent issues shell commands — git clone, install deps, run tests.
Agent parses exit code + tail of output.
Agent summarizes + posts comment.
Sandbox closes — cleanup is automatic.

Each sandbox lives ~3 min on average. Idle timeout (5 min free, 30 min pro) cleans up if the agent crashes mid-review.

Pitfalls

Don’t hard-code credentials into the shell. If the repo needs a private npm token to install deps, inject it via environment variables in the exec call — never in the command string that gets logged.
Disk can fill. node_modules is big; a single PR review might use 200–500 MB. Free tier (4 GB rootfs) handles that fine; heavier workloads want Pro (16 GB).
Running network-intensive tests counts against egress. Be aware if you’re processing many PRs — cached module registries help (npm ci over npm install).

Documentation Index

​Branch preview / PR review bot

​Shell variant — auto-detect the stack

​Why a sandbox (and not child_process.exec on your server)

​Composing with the agent loop

​Pitfalls

Branch preview / PR review bot

Shell variant — auto-detect the stack

Why a sandbox (and not `child_process.exec` on your server)

Composing with the agent loop

Pitfalls