Benchmarks
Production sandbox benchmark
100 sandbox lifecycles, measured through the production API path.
These numbers include API authentication, admission checks, scheduling, VM readiness, a command probe inside the sandbox, and cleanup. They are not kernel-only boot timings. The production lane uses `POST /api/v1/sandboxes/run`, which combines create, server-side readiness, and first exec into one durable production request.
Latest verified run
| Parameter | Value |
|---|---|
| Samples | 100 |
| Client concurrency | 10 |
| Batch pace | 0.5s |
| Template | miosa-sandbox |
| Size | xs |
| Mode | run (POST /api/v1/sandboxes/run) |
| Regions requested | us-mia, us-east, us-west |
| Successful lifecycles | 100 / 100 |
| Admission rejections | 0 |
| Total elapsed | 24s |
Object storage smoke benchmark
MIOSA also verifies the tenant object-storage API through the same public API surface customers use:
POST /api/v1/storage/buckets
PUT /api/v1/storage/buckets/:id/objects/:key
GET /api/v1/storage/buckets/:id/objects/:key
DELETE /api/v1/storage/buckets/:id/objects/:key The June 9, 2026 smoke used a scoped API key with storage:read and storage:write, created a temporary private bucket, uploaded one object,
downloaded it back, verified the SHA-256 digest, then deleted the object and
bucket.
| Object size | Samples | Result | Upload | Download | Throughput | Composite |
|---|---|---|---|---|---|---|
1 KB | 1 | 1 / 1 | 212ms | 178ms | 0.05 Mbps | 94.39 |
1 MB | 1 | 1 / 1 | 1.155s | 506ms | 16.59 Mbps | 92.61 |
5 MB | 10 | 10 / 10 | 4.975s | 682ms | 61.48 Mbps | 87.01 |
8 MB | 1 | 0 / 1 | failed | failed | failed | 0.00 |
10 MB | 1 | 0 / 1 | failed | failed | failed | 0.00 |
Storage comparison against the ComputeSDK reference
The ComputeSDK storage benchmark shown in the reference screenshots uses 10 MB files and 100 iterations. That exact test is not passing on MIOSA yet because
the current direct raw-upload path fails at 8-10 MB. The table below compares
the reference values with MIOSA’s latest passing live storage run so the gap is
visible instead of hidden.
| Provider | Benchmark shape | Success | Upload median | Download median | Throughput median | Composite |
|---|---|---|---|---|---|---|
| Tigris | 10 MB, 100 iterations | 100% | 319ms | 277ms | 303 Mbps | 95.4 |
| Cloudflare R2 | 10 MB, 100 iterations | 100% | 628ms | 276ms | 303 Mbps | 94.8 |
| MIOSA Storage API | 5 MB, 10 iterations | 100% | 4.975s | 682ms | 61.48 Mbps | 87.01 |
| MIOSA Storage API | 10 MB, 1 probe | 0% | failed | failed | failed | 0.00 |
Comments:
- MIOSA is not leaderboard-comparable to Tigris or Cloudflare R2 until the
10 MBdirect upload path passes consistently. - The current gap is upload-side. MIOSA’s
5 MBdownload median is usable but still slower than the ComputeSDK storage leaders; upload median is much slower. - The next backend fix is to move object upload off the default request-body read
path and onto the streaming or presigned upload path, then rerun the same
10 MB,100iteration benchmark. - Until that fix ships, customer docs should treat direct API uploads as a small-object path and recommend presigned/direct storage uploads for larger artifacts.
MIOSA latency split
Production run path:
| Phase | p50 | p95 | p99 | Min | Max |
|---|---|---|---|---|---|
| Command-ready TTI | 512ms | 0.992s | 1.300s | 295ms | 1.331s |
| VM boot slice | 95ms | 253ms | 253ms | 30ms | 312ms |
Previous standard public path, kept as a baseline:
| Phase | p50 | p95 | p99 | Min | Max |
|---|---|---|---|---|---|
| Create request | 287ms | 348ms | 353ms | 218ms | 356ms |
| Ready/running | 589ms | 1.005s | 1.246s | 477ms | 1.246s |
| Command-ready TTI | 947ms | 1.333s | 1.348s | 762ms | 1.610s |
| VM boot slice | 101ms | 253ms | 524ms | 31ms | 524ms |
What is making the number slower?
The VM path is fast. In the production run, all 100 / 100 samples used the warm
path and the reported boot slice had a 95ms median. The standard path was slower
because it used three public round trips: create, poll readiness, then exec.
| Component | Median | What it includes |
|---|---|---|
| Fused command-ready path | 512ms | Auth, workspace admission, scheduling, server-side wait, first command, response |
| Previous standard path | 947ms | Create response, external readiness polling, separate exec request |
| Round-trip removed by fusion | ~434ms | Public polling plus second public exec call |
| Reported warm boot | 95ms | The actual VM boot slice reported by the fleet |
So the first optimization target was not raw boot. It was create/status/exec round
trips. The run endpoint removes that waste while keeping the same durable
sandbox lifecycle underneath.
Optimization path
The benchmark exposes three separate public lanes. MIOSA should publish all three, because each answers a different buyer question.
| Lane | Current p50 | What it proves | Immediate target |
|---|---|---|---|
| VM-ready | 95ms | warm sandbox runtime is assigned and ready | <100ms sustained |
| Standard API-ready | 589ms | public API create has produced a running sandbox | <400ms |
| Standard command-ready | 947ms | first command succeeds through the legacy multi-request baseline | <800ms |
| Fused command-ready | 512ms | first command succeeds through one durable request | <500ms |
The engineering cuts are concrete:
| Cut | Expected impact | Why it works |
|---|---|---|
create_and_wait API/SDK path | 50-150ms | removes external GET polling and returns only when the server has committed running state |
create_and_exec benchmark/API path | 434ms measured | removes external polling and the separate public exec POST |
| Admission-path caching | 100-180ms | avoids repeat template, policy, plan, credit-balance, and scheduler reads for hot API keys |
| Region-local control plane / benchmark routing | 200-350ms in far regions | avoids us-west control-plane round trips when the VM already boots locally |
| Warm-pool guardrails | tail reduction | keeps the 99/100 warm hit rate at 100/100 and removes the cold 5.322s outlier |
Fast without fragile
The fastest version of MIOSA should not skip billing, policy, cleanup, or durable placement. It should move those checks to the right boundary and avoid repeating them on every hot sandbox.
| Layer | Keep durable | Make faster |
|---|---|---|
| Admission | API key, workspace, plan, policy, credits, idempotency all remain enforced | cache the effective policy/template/credit admission result for short TTLs per API key and workspace |
| Placement | persist sandbox row and node_id before exposing a route | allocate from an in-memory reservation ledger first, then write-through to Postgres/outbox |
| Boot | warm-pool claim remains the default, cold boot remains fallback | keep per-region warm pools ready for burst traffic |
| Readiness | only mark running after route registration and command health pass | push readiness over PubSub/SSE instead of external 50ms GET polling |
| First command | command still runs with auth and timeout | fuse create -> wait -> exec inside the selected host so the first command is not a second public API round trip |
| Cleanup | destroy, release reservation, stop billing, and revoke routes stay mandatory | make cleanup idempotent and janitor-backed so failed requests do not hold resources |
| Scale | each host keeps local runtime truth; control plane keeps durable truth | route creates to region-local controllers and replicate fleet state asynchronously |
Target shape: the public API has the run endpoint today. The SDK wrapper
should expose the same fast lane next:
await miosa.sandboxes.createAndWait({ template: "miosa-sandbox", region: "auto" })
await miosa.sandboxes.run({ template: "miosa-sandbox", command: "echo ok" }) createAndWait publishes API-ready latency. run publishes command-ready latency.
Both use the same durable sandbox lifecycle underneath; the difference is that the
server owns the wait loop and can execute the first command node-local.
Region split
| Region | Samples | Success | TTI p50 | TTI p95 | TTI p99 |
|---|---|---|---|---|---|
us-east | 33 | 100% | 328ms | 644ms | 684ms |
us-mia | 34 | 100% | 506ms | 815ms | 859ms |
us-west | 33 | 100% | 934ms | 1.257s | 1.331s |
Command-ready leaderboard
The external provider values below are from the supplied ComputeSDK-style benchmark
view. MIOSA is inserted by measured p50 command-ready TTI from the production
production run path, not isolated as a vanity row. Lower is better.
MIOSA
OrgoBenchmark placements
The benchmark screenshots expose separate tabs for median, P95, P99, and composite score. MIOSA’s raw latency placement is measured. The composite score below is labeled as an estimate because the external benchmark app does not publish its exact scoring formula; the estimate is anchored against the supplied provider score table and should be treated as directional until MIOSA is added to their official dataset.
MIOSAComposite score is estimated because the external benchmark does not publish the exact formula. MIOSA's measured inputs are 0.512s median, 0.992s p95, 1.300s p99, and 100% success.
Detailed metrics
| Provider | Score | Median TTI | P95 TTI | P99 TTI | Success |
|---|---|---|---|---|---|
| Declaw | 94.9 | 0.49s | 0.54s | 0.54s | 100% |
| MIOSA production path | ~92.4 est. | 0.512s | 0.992s | 1.300s | 100% |
| Northflank | 94.4 | 0.54s | 0.59s | 0.61s | 100% |
| Daytona | 74.3 | 0.58s | 5.52s | 5.58s | 100% |
| E2B | 92.7 | 0.64s | 0.83s | 0.94s | 100% |
| Modal | 92.8 | 0.67s | 0.78s | 0.79s | 100% |
| Vercel | 90.7 | 0.72s | 1.20s | 1.35s | 100% |
| Archil | 91.8 | 0.75s | 0.90s | 0.94s | 100% |
| Runloop | 84.6 | 0.81s | 2.64s | 2.64s | 100% |
| Legacy create / poll / exec baseline | n/a | 0.947s | 1.333s | 1.348s | 100% |
| Cloudflare | 78.3 | 1.84s | 2.62s | 2.72s | 100% |
| Blaxel | 80.1 | 1.87s | 2.07s | 2.35s | 100% |
| CodeSandbox | 16.4 | 7.32s | 9.90s | 10.54s | 100% |
| Tensorlake | 0.0 | 15.22s | 15.76s | 15.81s | 100% |
| Upstash | 0.0 | 17.01s | 23.71s | 23.98s | 100% |
Capability matrix
Speed is only one axis. MIOSA’s product surface is broader than “spawn a headless sandbox and exec a command.”
Runtime lane
Headless sandbox, files, previews, snapshots, and first command execution.
Platform lane
Desktop VM, deploy/release plane, managed data, white-label embedding, BYOC.
Enterprise lane
Compliance posture, GPU/H100 options, and mature enterprise procurement story.
| Capability | MIOSA | Declaw | Northflank | Modal | E2B | Vercel | Daytona | CodeSandbox | Upstash |
|---|---|---|---|---|---|---|---|---|---|
| Headless sandbox create/exec | YES | YES | YES | YES | YES | YES | YES | YES | YES |
| Filesystem API | YES | YES | YES | PARTIAL | YES | YES | YES | YES | YES |
| Port preview URLs | YES | YES | YES | PARTIAL | YES | YES | YES | YES | PARTIAL |
| Snapshot/fork/resume | YES | YES | PARTIAL | PARTIAL | YES | YES | YES | YES | PARTIAL |
| Full desktop/browser VM | YES | NO | NO | NO | PARTIAL | NO | PARTIAL | PARTIAL | NO |
| Managed deploy/release plane | YES | NO | YES | PARTIAL | NO | YES | NO | NO | NO |
| Managed Postgres/Redis/storage | YES | NO | YES | NO | NO | PARTIAL | NO | NO | Redis/Vector |
| White-label tenant embedding | YES | NO | PARTIAL | NO | NO | NO | NO | NO | NO |
| BYOC / customer-owned fleet | YES | NO | NO | NO | NO | NO | PARTIAL | NO | NO |
| MCP/agent tool surface | YES | NO | NO | NO | PARTIAL | NO | PARTIAL | NO | PARTIAL |
| Multi-language SDKs | 5 | TS/Py | TS | Py | TS/Py | TS/Py | TS/Py/Go/Java/Ruby | TS | TS |
| GPU story | H100 | NO | NO | YES | NO | NO | NO | NO | NO |
| Compliance public posture | Compliant | Limited | Enterprise | Enterprise | SOC2 | SOC2 | Enterprise | Enterprise | Enterprise |
What this means
- If a buyer only cares about raw median headless sandbox TTI, the category is tight.
- If a buyer needs desktops, browser automation, white-label embedding, deploys, data, and BYOC in the same platform, MIOSA is no longer competing on a one-column sandbox table.
- If a buyer needs GPU today, MIOSA can support GPU/H100 options while keeping the same platform surface.
Provider coverage
This page tracks the providers shown in the benchmark screenshots plus the providers exposed by ComputeSDK’s current provider list. Some vendors are broader platforms, some are narrow sandbox APIs, and some expose sandboxes as one feature in a larger developer-cloud product.
| Provider | Category | Strongest public angle | MIOSA comparison note |
|---|---|---|---|
| MIOSA | Sandbox + desktop + deploy + data platform | Full lifecycle platform for agents and white-label SaaS | Broader platform surface than a headless sandbox-only provider |
| Declaw | Security-oriented sandbox | Fast TTI plus policy/security positioning | Strong security story; no public desktop/deploy/data plane equivalent |
| Northflank | Developer cloud with sandboxes | Persistent app/runtime platform plus sandbox execution | Strong deploy platform; less agent/desktop-specific |
| Modal | Serverless compute/GPU | GPU and Python-function workflow | Strong GPU story; sandbox is not a white-label desktop platform |
| E2B | AI code execution sandbox | Mature AI-agent sandbox API | Strong headless agent sandbox; limited platform breadth |
| Archil | Sandbox/storage-oriented provider | Fast benchmark row and storage-first positioning | Less public breadth than MIOSA’s computers/deploy/data surface |
| Vercel | Frontend platform plus sandbox | Distribution, OIDC, polished DX | Strong existing-account funnel; sandbox is headless and region-limited in public docs |
| Runloop | Devbox/sandbox provider | Long-lived devboxes and snapshots | Strong devbox framing; no comparable white-label/data plane |
| Blaxel | Agent platform and sandbox | Agent hosting, batch jobs, sandbox console | Strong agent platform/compliance posture; narrower managed data/deploy surface |
| Cloudflare | Edge platform sandbox | Edge distribution and developer ecosystem | Strong edge ecosystem; sandbox is one product within Cloudflare |
| Daytona | OSS/open sandbox platform | Open-source breadth and fast code-to-exec positioning | Strong OSS story; MIOSA adds managed data, deploys, desktops, white-label |
| CodeSandbox | Cloud dev environment | Browser IDE, previews, devbox UX | Strong interactive IDE; weaker benchmark row in supplied data |
| Tensorlake | AI-native sandbox | AI/RL tooling and sandbox filesystem benchmarks | Strong AI-lab framing; no public desktop/deploy/data platform equivalent |
| Upstash | Serverless data plus Box | Redis/Vector/QStash adjacency and built-in agent tooling | Strong data brand; Box is newer and JS/TS-centered |
| HopX | Cloud sandbox API | Multi-language code execution and desktop automation docs | Lower benchmark visibility; overlaps sandbox APIs more than platform plane |
| Namespace | Build/devbox platform | Builders, devboxes, macOS/CI style workloads | Strong CI/build niche; different buyer motion from MIOSA agent platform |
Benchmark notes
The published MIOSA result is the clean 100/100 production lifecycle run after
deploying POST /api/v1/sandboxes/run on 2026-06-09 UTC. The older standard
path is also shown so the optimization is auditable instead of hidden. Setup
failures from under-scoped keys or insufficient plan concurrency are not counted
as fleet performance.
How to reproduce
Use a workspace and API key with enough sandbox concurrency for the test shape:
export MIOSA_API_KEY="msk_..."
export API_URL="https://api.miosa.ai"
export BENCH_WORKSPACE_ID="your-workspace-uuid"
./scripts/bench-continuous.sh
--samples 100
--concurrent 10
--pace 0.5
--template miosa-sandbox
--size xs
--mode run
--output bench-results/MIOSA-100-sandbox.tsv
--html bench-results/MIOSA-100-sandbox.html Use --mode standard to reproduce the legacy create / poll / exec baseline.
Use --mode run to reproduce the production command-ready lane. The
benchmark deletes successful samples unless --keep is passed.
Sources and current research
- ComputeSDK introduction for the provider set and abstraction shape.
- Daytona docs for Daytona’s current sandbox positioning and SDK breadth.
- Tensorlake homepage for Tensorlake’s published sandbox/filesystem benchmark positioning.
- HopX docs for HopX sandbox/code-execution capabilities.
- Blaxel docs for Blaxel sandbox/agent platform positioning.
- Northflank sandboxes docs for Northflank sandbox behavior.
- Internal competitive notes under
docs/audits/providers/for the first-pass capability matrix.