openrouter_client
Async LLM API client with automatic tool-call loop.
Uses any OpenAI-compatible chat-completions endpoint (configurable via
base_url). Embeddings use the native Google Gemini API only, via the
shared key pool in gemini_embed_pool.
- class openrouter_client.OpenRouterClient(api_key, model='x-ai/grok-4.1-fast', temperature=1.0, max_tokens=60000, tool_registry=None, max_tool_rounds=10, base_url='http://localhost:3000/openai', gemini_api_key='', gemini_count_tokens_model='gemini-3.1-flash-lite-preview', max_tool_output_chars=150000)[source]
Bases:
objectThin async wrapper around an OpenAI-compatible chat-completions endpoint.
Implements the full tool-call loop: when the model responds with
tool_calls, each tool is executed via theToolRegistry, the results are appended, and the API is called again. This repeats until the model produces a final text response (or a safety limit is reached).- Parameters:
- __init__(api_key, model='x-ai/grok-4.1-fast', temperature=1.0, max_tokens=60000, tool_registry=None, max_tool_rounds=10, base_url='http://localhost:3000/openai', gemini_api_key='', gemini_count_tokens_model='gemini-3.1-flash-lite-preview', max_tool_output_chars=150000)[source]
Initialize the instance.
- Parameters:
api_key (
str) – The api key value.model (
str) – The model value.temperature (
float) – The temperature value.max_tokens (
int) – The max tokens value.tool_registry (
ToolRegistry|None) – The tool registry value.max_tool_rounds (
int) – The max tool rounds value.base_url (
str) – The base url value.gemini_api_key (
str) – The gemini api key value.gemini_count_tokens_model (
str) – Model id for Gemini countTokens API.max_tool_output_chars (
int) – Max tool result string length before truncation in the chat loop;<= 0disables the cap.
- Return type:
None
- async count_input_tokens(messages, *, gemini_model=None)[source]
Public wrapper for Gemini
countTokenson OpenAI-shaped messages.gemini_model overrides
gemini_count_tokens_modelfor this call.
- async chat(messages, user_id='', ctx=None, tool_names=None, validate_header=False, token_count=None, on_intermediate_text=None)[source]
Send messages to the LLM and return the final assistant text.
If the model requests tool calls, they are executed automatically and the conversation is continued until a text response is produced.
user_id is forwarded to
ToolRegistry.call()for permission checking. ctx is forwarded to tools that opt-in to receiving it.tool_names, when provided, restricts which tools the LLM sees to the given subset of registered tool names.
token_count, when provided, is used directly instead of calling
_count_tokens— allows the caller to pre-compute the count concurrently with other work.on_intermediate_text, when provided, is called with any text content the model produces alongside tool calls. Without this callback, such text is silently carried in the conversation history but never surfaced to the user.
- async embed(text, model)[source]
Generate an embedding vector for text via the Gemini API.
Uses the shared key pool for rate-limit distribution. Retries with exponential back-off (capped at
MAX_EMBED_DELAY) up toMAX_EMBED_RETRIEStimes before raising.Raises
ValueErrorimmediately if text is empty or whitespace-only (the embedding API rejects such input with 400).