Skip to main content
PR-review bots and “run my tests” agents all do the same thing under the hood: check out a branch, run a command, summarize what happened. The sandbox gives you the isolation that makes this safe to run on arbitrary customer repos.

Branch preview / PR review bot

from podflare import Sandbox

sbx = Sandbox(idle_timeout_seconds=1800)

r = sbx.run_code("""
git clone --depth 1 \
  --branch feat/new-billing \
  https://github.com/acme/app /tmp/app

cd /tmp/app
npm ci --silent
npm test 2>&1 | tail -40
""", language="bash")

print("exit:", r.exit_code)
print(r.stdout)
sbx.close()

Shell variant — auto-detect the stack

#!/usr/bin/env bash
set -euo pipefail

REPO=$1
BRANCH=$2

git clone --depth 1 --branch "$BRANCH" "$REPO" /tmp/repo
cd /tmp/repo

# Detect stack and run default tests
if [ -f package.json ]; then
  npm ci
  npm test
elif [ -f pyproject.toml ]; then
  pip install -e '.[dev]'
  pytest -q
elif [ -f Cargo.toml ]; then
  cargo test
fi

Why a sandbox (and not child_process.exec on your server)

Repos the agent clones contain arbitrary code. Their postinstall scripts, pytest conftests, Makefile targets — any of them can run whatever they want on the host. Running that inside a Podflare microVM means:
  • No access to your server’s filesystem.
  • No access to your server’s network beyond what you configure.
  • No long-running processes left behind — the microVM dies.
  • No accidental resource exhaustion — CPU/RAM are capped per sandbox.

Composing with the agent loop

The typical PR-review flow:
  1. Agent is asked “does this PR pass tests?”
  2. Agent calls Sandbox.create() (one sandbox per review).
  3. Agent issues shell commands — git clone, install deps, run tests.
  4. Agent parses exit code + tail of output.
  5. Agent summarizes + posts comment.
  6. Sandbox closes — cleanup is automatic.
Each sandbox lives ~3 min on average. Idle timeout (5 min free, 30 min pro) cleans up if the agent crashes mid-review.

Pitfalls

  • Don’t hard-code credentials into the shell. If the repo needs a private npm token to install deps, inject it via environment variables in the exec call — never in the command string that gets logged.
  • Disk can fill. node_modules is big; a single PR review might use 200–500 MB. Free tier (4 GB rootfs) handles that fine; heavier workloads want Pro (16 GB).
  • Running network-intensive tests counts against egress. Be aware if you’re processing many PRs — cached module registries help (npm ci over npm install).