Starship
An agentic engineering platform where co-builder agents run software tasks in ephemeral sandboxes. I built the agent runtime, the live console, and the EKS workspace layer that makes sandboxes fast and resumable.
The problem
Engineering teams want to hand work to agents — "fix this flaky test", "open a PR that migrates this module" — but agents need somewhere safe and fast to run. A real workspace with the repo, the toolchain, and live execution. Spinning that up per task is slow, and tearing it down loses the context an agent built up.
Starship is the platform for that: co-builder agents that run real software engineering tasks inside ephemeral, resumable sandboxes, orchestrated from a control plane and observable from a single console.
What I built
- The agent runtime + supervisor that runs inside each sandbox
- The live, durable chat/transcript surface engineers drive agents from
- The EKS + S3 layer that makes sandboxes fast to start and resume
- Production telemetry over every agent run — cost, success, artifacts
Agent console
The console is where a person and an agent meet. It is a realtime surface over WebSockets with durable transcript replay — you can close the tab mid-run and come back to the full history, including approvals, user-input prompts, attached images, terminal context, and the diff of changed files. I built the socket state machine, the replay model, and the runtime-event plumbing that keeps the UI honest about what the agent is actually doing.
Fast, resumable sandboxes
A sandbox that takes a minute to boot kills the loop. I worked on the workspace runtime on EKS with S3-backed file resume and runtime leases, so a workspace can be paused and brought back with its filesystem intact instead of rebuilt from scratch.
Review automation & triggers
Beyond the interactive loop, I built GitHub PR review automation — structured review prompts, a validator, maintainability signals, and durable review actions that survive restarts — plus workflow triggers across GitHub, Jira, and Slack: mention an agent, capture context, run, report back.
Impact
I built the production reporting layer that the team runs on, covering cost, success/failure, missing artifacts, and review modes across every workflow run.
- 1,486
- 98
- 661
What I took from it
Agent products live or die on trust and latency. Trust comes from making the agent's work legible — durable transcripts, visible diffs, explicit approvals. Latency comes from the unglamorous infrastructure underneath: how fast a sandbox starts, how cheaply it resumes. I got to own both ends.
Next — Spectra