Tokens API · OpenAI-compatible
One endpoint. Any OpenAI client.
The Hermes Tokens API speaks the OpenAI Chat Completions protocol. Point any client — the official OpenAI SDKs, an agent framework, even another hermes-agent VM — at the base URL below and authenticate with a hermes_pk_… key. Billing comes out of your µUSD credit balance in real time.
Base URL & auth
Keys are scoped to your account and bill against your credit ledger. Keep them server-side — CORS is denied so they can't leak from a browser.
Quickstart
curl
curl -N https://www.maiavm.com/api/v1/chat/completions \
-H "Authorization: Bearer $HERMES_KEY" \
-H "content-type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"stream": true,
"messages": [{"role":"user","content":"Say hi in five words."}]
}'Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://www.maiavm.com/api/v1",
api_key="hermes_pk_...",
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
stream=True,
messages=[{"role": "user", "content": "Say hi in five words."}],
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)Node.js (openai SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://www.maiavm.com/api/v1",
apiKey: process.env.HERMES_KEY,
});
const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
stream: true,
messages: [{ role: "user", content: "Say hi in five words." }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? "");
}Models
Generated from the live catalog. Prices are USD per 1M tokens and include the platform's markup; you're billed in µUSD against your credit balance.
| Model ID | Display | Context | In / 1M | Out / 1M | Tools |
|---|---|---|---|---|---|
| deepseek-v4-flash | DeepSeek V4 Flash | 1M | $0.140 | $0.280 | supported |
Tool / function calling
Tool calls pass through to providers that support them — see the “Tools” column above. Request shape and tool_calls response shape match the OpenAI spec exactly.
curl -N https://www.maiavm.com/api/v1/chat/completions \
-H "Authorization: Bearer $HERMES_KEY" \
-H "content-type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {"type":"object","properties":{"city":{"type":"string"}}}
}
}],
"messages": [{"role":"user","content":"Weather in Tokyo?"}]
}'Tool-bearing requests bypass the response cache (tool turns rarely repeat) and reject up front on models marked text-only.
Endpoints
/v1/chat/completionsStreaming + non-streaming. Tools, vision (DeepSeek), and JSON mode supported.
/v1/modelsOpenAI-shape catalog. Each model carries supports_tools so clients can branch on capability.
Power another hermes-agent
Any hermes-agent VM can use this API as its inference backend. Set two env vars in that agent's dashboard, restart its gateway, and it's done.
- In that agent's dashboard, go to Keys → Secrets and set the two vars below.
- Click “Restart agent” on that agent's page so the gateway re-reads
/opt/data/.env. - Confirm it switched over — usage will start showing up in this dashboard's Tokens page within seconds of the first call.
# In the agent's dashboard → Keys → Secrets, set:
OPENAI_BASE_URL=https://www.maiavm.com/api/v1
OPENAI_API_KEY=hermes_pk_...The agent's OpenAI SDK reads OPENAI_BASE_URL automatically. Switch back any time by clearing both vars and restarting the gateway.
Limits & errors
- Request body capped at 1 MB →
413. - 60 requests per minute and 1,000 per day per key →
429withretry-after. - Credit balance exhausted →
402 insufficient_credits. Top up at /dashboard/billing. - Model not in this key's allowlist →
403 model_not_allowed. - Per-key monthly cap reached →
429 monthly_cap_reached; resets the 1st of each calendar UTC month.