If your agent is going to run code written by an LLM, you need a
sandbox. Podflare ships small adapters for the three frameworks most
teams use. Each one is ~30 lines of glue; you don’t write the glue
yourself.
Anthropic’s Messages API exposes a hosted code_execution tool. You
can replace the hosted execution with your own sandbox — same tool
spec, your compute. The handle_code_execution_tool_use helper
consumes a tool_use block and returns the matching tool_result
block.
from anthropic import Anthropic
from podflare import Sandbox
from podflare.integrations.anthropic import handle_code_execution_tool_use
client = Anthropic()
with Sandbox() as sbx:
messages = [
{"role": "user",
"content": "Download this CSV and summarize: https://ex.com/sales.csv"}
]
while True:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[{"type": "code_execution_20250825",
"name": "code_execution"}],
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
break
results = [
handle_code_execution_tool_use(b, sbx)
for b in resp.content
if b.type == "tool_use" and b.name == "code_execution"
]
messages.append({"role": "user", "content": results})
print(resp.content[0].text)
When to reach for this
- You already use Anthropic and want the native tool contract.
- You need data-plane isolation — Anthropic’s hosted code_execution
runs on their infra; with Podflare it runs on yours.
- Per-customer quota / billing attribution lives on your API key,
not Anthropic’s.
OpenAI Agents SDK
The Agents SDK accepts any function tool. podflare_code_interpreter()
(Python) and podflareCodeInterpreter() (TypeScript) return one.
from agents import Agent, Runner
from podflare.integrations.openai_agents import podflare_code_interpreter
agent = Agent(
name="assistant",
instructions="Use the run_code tool to answer data questions.",
tools=[podflare_code_interpreter()],
)
result = await Runner.run(agent, "what is 111 * 111?")
print(result.final_output)
Vercel AI SDK
podflareRunCode() returns a shape compatible with tool() from the
ai package.
import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
import { podflareRunCode } from "podflare/ai-sdk";
const pf = podflareRunCode({ template: "python-datasci" });
const result = await generateText({
model: openai("gpt-4o"),
tools: {
runCode: tool({
description: pf.description,
parameters: z.object({
code: z.string(),
language: z.enum(["python", "bash"]).optional(),
}),
execute: pf.execute,
}),
},
prompt: "Fetch the top 5 HN stories and chart their scores.",
maxSteps: 10,
});
await pf.close();
console.log(result.text);
Persistent REPL pattern
State survives between run_code calls — filesystem and Python
variables. This is how you build a “chat with a notebook” UX:
from podflare import Sandbox
sbx = Sandbox(template="python-datasci", idle_timeout_seconds=1800)
# Turn 1 — load data
sbx.run_code("""
import pandas as pd
df = pd.read_parquet('https://ex.com/events.parquet')
print(df.shape)
""")
# Turn 2 — df is still there
r = sbx.run_code("print(df.head().to_string())")
print(r.stdout)
# Turn 3 — even a file is still there
sbx.run_code("df.head(100).to_csv('/tmp/sample.csv')")
sbx.run_code("import os; print(os.listdir('/tmp'))")
sbx.close()
Set a generous idle_timeout_seconds so the sandbox survives
between user messages. Paid tiers let you go up to 2 h — enough
for long interactive sessions.
Pitfalls
- Don’t share a sandbox across users. One per session is the
right granularity. Fork if you need parallel branches within
one user’s session.
- Install packages once, reuse many. A
pip install is cheap
but not free (~100–500 ms). Cache the sandbox and install on
first run; subsequent turns hit the already-installed version.
- Close sandboxes when done. Idle timeout catches forgotten
ones but you’ll burn budget until it fires. SDK’s
with / using
semantics do this automatically.