What Is an LLM Proxy?
An LLM proxy sits between your application and LLM providers. Instead of calling OpenAI, Anthropic, or Google directly, your code calls a single endpoint that handles routing, authentication, and logging.
Think of it like a reverse proxy for AI. Nginx routes HTTP traffic to the right backend server. An LLM proxy routes inference requests to the right model provider. Your application does not need to know which provider it is talking to.
Why You Need One
Running an AI agent without a proxy means managing every provider integration yourself. That gets messy fast:
- Key management — Each provider has its own API key format, rotation policy, and rate limits. A proxy centralizes key storage so your agent never touches raw credentials.
- Cost tracking — Without a proxy, you need to parse each provider's billing dashboard separately. A proxy logs every request with token counts and cost, giving you a single view of spend.
- Model switching — Want to test Claude instead of GPT-4? Without a proxy, that means changing API calls, auth headers, and response parsing. With a proxy, you change one parameter.
- Reliability — If OpenAI is down, a proxy can automatically fall back to another provider. Your agent keeps working.
How OmniClaw's Proxy Works
OmniClaw uses Kilo, OmniRun's purpose-built LLM proxy. Here is the request flow:
Your Agent (OpenClaw) | v Kilo LLM Proxy (inside OmniRun) | |-- Authenticates the request |-- Resolves model -> provider mapping |-- Injects provider API key from vault |-- Forwards to provider | +---> OpenAI API +---> Anthropic API +---> Google AI API | v Response streamed back to agent | v Kilo logs tokens, cost, latency
The proxy exposes an OpenAI-compatible endpoint. Any library or tool that works with the OpenAI API works with Kilo. This is important because OpenClaw's default configuration already speaks the OpenAI protocol — no changes needed.
Supported Models
The proxy currently supports these models through OmniClaw:
GPT-4o
OpenAIMultimodal flagship. Strong at instruction following, coding, and analysis. Supports images and files.
GPT-4o Mini
OpenAIFastest and cheapest OpenAI model. Best for quick answers and simple tasks where latency matters.
Claude 3.5 Sonnet
AnthropicExcellent at nuanced conversation, long-form writing, and careful analysis. Strong safety alignment.
Claude 3.5 Haiku
AnthropicFast and affordable. Good balance of quality and speed for everyday agent tasks.
Gemini 1.5 Pro
GoogleStrong reasoning and multilingual support. Excellent at long-context tasks with up to 1M tokens.
See the full list with pricing on the models page.
BYOK: Bring Your Own Keys
OmniClaw includes LLM credit out of the box, but you are not limited to using ours. If you have your own API keys from OpenAI, Anthropic, or Google, you can store them in the credential vault and the proxy will use them instead.
This is useful if you have a negotiated enterprise rate, need to use a specific organization account, or just want to use your existing billing. The proxy still handles routing, logging, and failover — it just uses your key for authentication with the upstream provider.
Spend Tracking and Free Credit
Every OmniClaw account starts with $5 free credit. The proxy tracks usage per request:
- Input tokens — What your agent sends (prompt, context, conversation history)
- Output tokens — What the model generates (the response)
- Cost per request — Calculated from the model's per-token pricing
- Cumulative spend — Running total visible in your dashboard
For most users, $5 covers weeks of normal usage. A typical WhatsApp conversation costs a fraction of a cent per message. You can track your spend in real time from the dashboard.
Code Example
The proxy exposes a standard OpenAI-compatible chat completions endpoint. If you are building custom skills or want to understand what happens under the hood, here is what a request looks like:
curl -X POST https://kilo.omnirun.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OMNIRUN_TOKEN" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'That is it. Same format as a direct OpenAI call. To switch to Claude, change "model": "gpt-4o" to "model": "claude-3.5-sonnet". The proxy handles the rest — translating the request to Anthropic's format, authenticating with the right key, and normalizing the response back.
Try the proxy yourself
Deploy an OpenClaw agent and get $5 free LLM credit. Switch models with a single parameter change.
Deploy now