Inference layer · for AI agents

The inference layer
between agents and models.

Layer X1 sits beneath your coding agent and runs every model call through a learned, cache-aware runtime — faster, cheaper, and resilient when the agent would otherwise stall.

Get a key Read the docs

Drop-in · Anthropic-compatibleCache-aware routingLearned per-trajectory

Layer X1 — a signal passing down through stacked layers of intelligence

Layer X1 engine — a field of computation

The engine

An engine of layers.

A request-level gateway sees one call at a time. Layer X1 sees the whole trajectory — and runs it through a stack of layers, each one a decision your agent never has to make.

01
Cache
Keep the prompt prefix warm across the entire run, not one call.
02
Route
Pick the least-cost path through the trajectory, not the cheapest single call.
03
Branch
Pin the main trunk to one model; fork cheap branches for sub-tasks.
04
Escalate
Climb to a stronger model only when the work actually demands it.

Explore the engine

Services

One API. Any agent. Any model.

Point your agent at a single endpoint. Layer X1 speaks 70+ models across every major provider — plus open-source — so you switch models without touching a line of code.

One API
A single drop-in endpoint. One environment variable to switch it on.
Any agent
Claude Code, Cursor, or your own loop — anything that speaks the API.
Any model
70+ models across every major provider, plus open-source, selected per step of the run.

One gateway routing many agents to many models

Compatibility

Any agent. Any model.

Point any agent at one endpoint and reach 70+ models across every major provider — proprietary and open-source alike. Switch models mid-run without touching a line of your agent.

Works with your agentdrop-in