Tokens API · OpenAI-compatible

One endpoint. Any OpenAI client.

The Hermes Tokens API speaks the OpenAI Chat Completions protocol. Point any client — the official OpenAI SDKs, an agent framework, even another hermes-agent VM — at the base URL below and authenticate with a hermes_pk_… key. Billing comes out of your µUSD credit balance in real time.

Base URL & auth

Base URLhttps://www.maiavm.com/api/v1
Auth headerAuthorization: Bearer hermes_pk_...

Keys are scoped to your account and bill against your credit ledger. Keep them server-side — CORS is denied so they can't leak from a browser.

Quickstart

curl

curl -N https://www.maiavm.com/api/v1/chat/completions \
  -H "Authorization: Bearer $HERMES_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": true,
    "messages": [{"role":"user","content":"Say hi in five words."}]
  }'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://www.maiavm.com/api/v1",
    api_key="hermes_pk_...",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    stream=True,
    messages=[{"role": "user", "content": "Say hi in five words."}],
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Node.js (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://www.maiavm.com/api/v1",
  apiKey: process.env.HERMES_KEY,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  stream: true,
  messages: [{ role: "user", content: "Say hi in five words." }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Models

Generated from the live catalog. Prices are USD per 1M tokens and include the platform's markup; you're billed in µUSD against your credit balance.

Model IDDisplayContextIn / 1MOut / 1MTools
deepseek-v4-flashDeepSeek V4 Flash1M$0.140$0.280supported

Tool / function calling

Tool calls pass through to providers that support them — see the “Tools” column above. Request shape and tool_calls response shape match the OpenAI spec exactly.

curl -N https://www.maiavm.com/api/v1/chat/completions \
  -H "Authorization: Bearer $HERMES_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "parameters": {"type":"object","properties":{"city":{"type":"string"}}}
      }
    }],
    "messages": [{"role":"user","content":"Weather in Tokyo?"}]
  }'

Tool-bearing requests bypass the response cache (tool turns rarely repeat) and reject up front on models marked text-only.

Endpoints

POST/v1/chat/completions

Streaming + non-streaming. Tools, vision (DeepSeek), and JSON mode supported.

GET/v1/models

OpenAI-shape catalog. Each model carries supports_tools so clients can branch on capability.

Power another hermes-agent

Any hermes-agent VM can use this API as its inference backend. Set two env vars in that agent's dashboard, restart its gateway, and it's done.

  1. In that agent's dashboard, go to Keys → Secrets and set the two vars below.
  2. Click “Restart agent” on that agent's page so the gateway re-reads /opt/data/.env.
  3. Confirm it switched over — usage will start showing up in this dashboard's Tokens page within seconds of the first call.
# In the agent's dashboard → Keys → Secrets, set:
OPENAI_BASE_URL=https://www.maiavm.com/api/v1
OPENAI_API_KEY=hermes_pk_...

The agent's OpenAI SDK reads OPENAI_BASE_URL automatically. Switch back any time by clearing both vars and restarting the gateway.

Limits & errors

  • Request body capped at 1 MB → 413.
  • 60 requests per minute and 1,000 per day per key → 429 with retry-after.
  • Credit balance exhausted → 402 insufficient_credits. Top up at /dashboard/billing.
  • Model not in this key's allowlist → 403 model_not_allowed.
  • Per-key monthly cap reached → 429 monthly_cap_reached; resets the 1st of each calendar UTC month.