Integrating an AI agent

MIOSA provides the substrate - sandboxes, computers, file I/O, exec, previews, snapshots. You provide the agent loop: the LLM calls, screenshot decoding, planning, and turn management. This guide shows how to wire the two together.

The pattern

Your agent code owns the loop - choosing the LLM, formatting tool calls, interpreting results, deciding when to stop. MIOSA executes the actions your agent requests and streams back results.

Sandbox integration

Sandboxes are the preferred substrate for code-generation agents. The agent writes files, runs build commands, and opens preview URLs - all through MIOSA primitives.

The sandbox persists between turns - files written in turn 1 are still there in turn 5. The agent does not need to re-send context on every call.

Computer integration

Computers are full Linux desktop VMs with screenshot, click, type, and keyboard APIs. Use them when your agent needs to operate a graphical interface.

State tracking

Your agent code is responsible for keeping turn history, screenshots, and decisions. MIOSA executes discrete actions - it does not maintain an agent loop or conversation state on your behalf.

A minimal turn-log pattern:

turns = []

while not done:
    png = computer.screenshot()
    action = agent.next_action(png, history=turns)
    turns.append({"screenshot": png, "action": action})

    # Execute action on the computer...

Approval workflows

If your workflow requires human review before an action executes, your agent pauses and asks the user. MIOSA does not gate actions - all approval logic lives in your application.

action = agent.next_action(png)

if action.requires_approval:
    approved = ask_user(f"Approve: {action.description}?")
    if not approved:
        agent.cancel(action)
        continue

computer.left_click(action.x, action.y)

Streaming events

Subscribe to the sandbox or computer event stream to receive real-time progress without polling:

See Events (SSE) reference → for the full event catalog.

Best practices

Use Idempotency-Key on every action. If your agent retries a failed request, the idempotency key prevents duplicate execution. Generate a key tied to your turn ID: "turn-{turn_number}-{action_type}".
Cap max_turns in your agent loop. A runaway agent loop can exhaust credits. Set a hard ceiling and surface the failure cleanly to the user.
Record screenshots and decisions. Persist each turn’s screenshot and the action taken. When an agent makes a bad decision, this record lets you replay and diagnose without re-running the whole task.
Use snapshots between turns. Call sbx.snapshots.create() at stable checkpoints. If a later turn corrupts state, restore from snapshot rather than starting from scratch.
Prefer sandbox exec over computer desktop for code tasks. File I/O and shell commands through the sandbox API are more reliable and faster than typing into a terminal window on a desktop.

Examples

An agent that generates and deploys a static site end-to-end.

View on GitHub →

Full Python and TypeScript SDK method signatures for sandboxes and computers.

SDK reference →