openrouter_client

Async LLM API client with automatic tool-call loop.

Uses any OpenAI-compatible chat-completions endpoint (configurable via base_url). Embeddings use the native Google Gemini API only, via the shared key pool in gemini_embed_pool.

class openrouter_client.OpenRouterClient(api_key, model='x-ai/grok-4.1-fast', temperature=1.0, max_tokens=60000, tool_registry=None, max_tool_rounds=10, base_url='http://localhost:3000/openai', gemini_api_key='', gemini_count_tokens_model='gemini-3.1-flash-lite-preview', max_tool_output_chars=150000)[source]

Bases: object

Thin async wrapper around an OpenAI-compatible chat-completions endpoint.

Implements the full tool-call loop: when the model responds with tool_calls, each tool is executed via the ToolRegistry, the results are appended, and the API is called again. This repeats until the model produces a final text response (or a safety limit is reached).

Parameters:
  • api_key (str)

  • model (str)

  • temperature (float)

  • max_tokens (int)

  • tool_registry (ToolRegistry | None)

  • max_tool_rounds (int)

  • base_url (str)

  • gemini_api_key (str)

  • gemini_count_tokens_model (str)

  • max_tool_output_chars (int)

__init__(api_key, model='x-ai/grok-4.1-fast', temperature=1.0, max_tokens=60000, tool_registry=None, max_tool_rounds=10, base_url='http://localhost:3000/openai', gemini_api_key='', gemini_count_tokens_model='gemini-3.1-flash-lite-preview', max_tool_output_chars=150000)[source]

Initialize the instance.

Parameters:
  • api_key (str) – The api key value.

  • model (str) – The model value.

  • temperature (float) – The temperature value.

  • max_tokens (int) – The max tokens value.

  • tool_registry (ToolRegistry | None) – The tool registry value.

  • max_tool_rounds (int) – The max tool rounds value.

  • base_url (str) – The base url value.

  • gemini_api_key (str) – The gemini api key value.

  • gemini_count_tokens_model (str) – Model id for Gemini countTokens API.

  • max_tool_output_chars (int) – Max tool result string length before truncation in the chat loop; <= 0 disables the cap.

Return type:

None

async count_input_tokens(messages, *, gemini_model=None)[source]

Public wrapper for Gemini countTokens on OpenAI-shaped messages.

gemini_model overrides gemini_count_tokens_model for this call.

Return type:

int | None

Parameters:
async chat(messages, user_id='', ctx=None, tool_names=None, validate_header=False, token_count=None, on_intermediate_text=None)[source]

Send messages to the LLM and return the final assistant text.

If the model requests tool calls, they are executed automatically and the conversation is continued until a text response is produced.

user_id is forwarded to ToolRegistry.call() for permission checking. ctx is forwarded to tools that opt-in to receiving it.

tool_names, when provided, restricts which tools the LLM sees to the given subset of registered tool names.

token_count, when provided, is used directly instead of calling _count_tokens — allows the caller to pre-compute the count concurrently with other work.

on_intermediate_text, when provided, is called with any text content the model produces alongside tool calls. Without this callback, such text is silently carried in the conversation history but never surfaced to the user.

Return type:

str

Parameters:
async embed(text, model)[source]

Generate an embedding vector for text via the Gemini API.

Uses the shared key pool for rate-limit distribution. Retries with exponential back-off (capped at MAX_EMBED_DELAY) up to MAX_EMBED_RETRIES times before raising.

Raises ValueError immediately if text is empty or whitespace-only (the embedding API rejects such input with 400).

Parameters:
  • text (str) – The text to embed.

  • model (str) – The embedding model identifier (e.g. "google/gemini-embedding-001").

Returns:

The embedding vector.

Return type:

list[float]

async embed_batch(texts, model)[source]

Generate embedding vectors for multiple texts via the Gemini API.

Uses the shared key pool. Empty or whitespace-only texts are filtered out; their positions are filled with zero vectors. Retries with exponential back-off up to MAX_EMBED_RETRIES times.

Parameters:
  • texts (list[str]) – List of texts to embed.

  • model (str) – The embedding model identifier.

Returns:

One embedding vector per input text, in the same order.

Return type:

list[list[float]]

async close()[source]

Close.

Return type:

None