The inference layer
between agents and models.
Layer X1 sits beneath your coding agent and runs every model call through a learned, cache-aware runtime — faster, cheaper, and resilient when the agent would otherwise stall.


An engine of layers.
A request-level gateway sees one call at a time. Layer X1 sees the whole trajectory — and runs it through a stack of layers, each one a decision your agent never has to make.
- 01Cache
Keep the prompt prefix warm across the entire run, not one call.
- 02Route
Pick the least-cost path through the trajectory, not the cheapest single call.
- 03Branch
Pin the main trunk to one model; fork cheap branches for sub-tasks.
- 04Escalate
Climb to a stronger model only when the work actually demands it.
One API. Any agent. Any model.
Point your agent at a single endpoint. Layer X1 speaks 70+ models across every major provider — plus open-source — so you switch models without touching a line of code.
- One API
A single drop-in endpoint. One environment variable to switch it on.
- Any agent
Claude Code, Cursor, or your own loop — anything that speaks the API.
- Any model
70+ models across every major provider, plus open-source, selected per step of the run.

Any agent. Any model.
Point any agent at one endpoint and reach 70+ models across every major provider — proprietary and open-source alike. Switch models mid-run without touching a line of your agent.

Put the layer under your agent.
One environment variable. Your agent doesn't change — everything underneath does.