platforms.media_common module

Shared media-to-content-part conversion for all platforms.

Converts raw bytes + MIME type into the multimodal content-part format expected by the OpenRouter chat-completions API. Platform-specific download logic lives in each adapter; this module only handles the format conversion.

Office / ODF / EPUB documents whose MIME types are not supported by the downstream LLM are automatically converted to PDF via LibreOffice headless before being embedded as content parts.

GIF and animated WebP images are automatically re-encoded as MP4 (H.264 baseline) so the Gemini API receives a well-supported video format instead of GIF/animated-WebP.

async platforms.media_common.download_with_retry(downloader, *, attempts=3, base_delay=0.5, max_delay=8.0, label='')[source]

Call downloader with bounded retry on transient CDN failures.

downloader is the same async callable the platform adapters already pass to MediaCache.get_or_download() — it returns (data, mimetype, filename). A single transient Discord/Matrix CDN blip otherwise drops the attachment for that one message (the cache fix only stops the failure from becoming permanent); a few retries close that one-shot loss.

Retries on:

  • any raised exception that is not a permanent HTTP status (see _is_permanent_download_error()), and

  • an empty-bytes result (not data) — treated as a failed download, consistent with MediaCache.get_or_download which refuses to cache it.

Permanent errors (404/403/410/…) re-raise immediately. After the final attempt the last result is returned as-is (possibly empty) so the existing empty-media handling downstream stays in control: get_or_download will not cache it and media_to_content_parts emits a text note instead of a blank image.

Backoff is exponential (base_delay * 2**n) capped at max_delay, with uniform jitter to avoid thundering-herd re-fetches.

Return type:

tuple[bytes, str, str]

Parameters:
async platforms.media_common.maybe_reencode_gif(data, mimetype, filename)[source]

Re-encode GIF or animated WebP as MP4 for the Gemini API.

For animated WebP: Pillow converts WebP->GIF (no ffmpeg libwebp needed), then the GIF is converted to MP4 via ffmpeg.

For GIF: directly converted to MP4 via ffmpeg.

Returns (data, mimetype, filename) – either the converted MP4 or the original inputs unchanged if conversion fails, the input is not a GIF, or the WebP is not animated (static WebP passes through as a normal image).

Return type:

tuple[bytes, str, str]

Parameters:
platforms.media_common.detect_image_mimetype_from_bytes(data)[source]

Best-effort image MIME from raw bytes (magic + Pillow).

Returns a lowercase image/* type, or None if unknown.

Return type:

str | None

Parameters:

data (bytes)

platforms.media_common.shrink_image_under_max_bytes(data, declared_mimetype='', *, max_bytes=4194304)[source]

Re-encode and optionally downscale raster image bytes to fit under max_bytes.

Used for API providers with a hard per-image size limit. Returns None if the image cannot be opened as a raster, is not shrinkable (e.g. SVG), or remains above max_bytes after resize attempts.

If data is already at or below max_bytes, returns data unchanged.

Return type:

bytes | None

Parameters:
platforms.media_common.reconcile_image_mimetype_sync(data, declared)[source]

Correct a declared image/* MIME type against the type detected from bytes.

Platforms and CDNs frequently mislabel images (e.g. a JPEG served as image/png), which makes downstream providers reject or mis-handle the data: URI. This sniffs the real type from the leading bytes and returns the detected value when it disagrees with the declared one, stripping any charset/parameter suffix in the process. Non-image declarations are passed through untouched, and when detection fails the bare declared type is returned.

Delegates the actual sniffing to detect_image_mimetype_from_bytes() (magic-number checks plus a Pillow fallback) and logs an info line at the module logger when a reconciliation actually changes the type; it performs no I/O of its own. Called by the async wrapper reconcile_image_mimetype() and directly by url_content_extractor when sanitizing fetched images.

Parameters:
  • data (bytes) – The raw image bytes to sniff.

  • declared (str) – The MIME type claimed by the source (may include parameters).

Returns:

The reconciled MIME type — the detected type when it differs, else the parameter-stripped declared type.

Return type:

str

async platforms.media_common.reconcile_image_mimetype(data, declared)[source]

Async wrapper for reconcile_image_mimetype_sync() off the event loop.

MIME sniffing can fall back to a Pillow decode, which is CPU-bound and would otherwise block the asyncio event loop on a large image. This offloads the synchronous reconcile to a worker thread via asyncio.to_thread so callers can await it inline. The detection logic, logging, and return contract are identical to the sync variant.

Called by media_to_content_parts() here and by url_content_extractor when preparing fetched images for the model.

Parameters:
  • data (bytes) – The raw image bytes to sniff.

  • declared (str) – The MIME type claimed by the source.

Returns:

The reconciled MIME type (see reconcile_image_mimetype_sync()).

Return type:

str

async platforms.media_common.media_to_content_parts(data, mimetype, filename, body_text=None)[source]

Build an OpenRouter multimodal content-parts list from raw media.

Office / ODF documents are transparently converted to PDF via LibreOffice so the LLM never sees an unsupported MIME type.

Parameters:
  • data (bytes) – The raw file bytes.

  • mimetype (str) – MIME type of the file (e.g. "image/png").

  • filename (str) – Human-readable filename.

  • body_text (str | None) – Optional caption / message body text to include alongside the media. When present it is prepended as a text content part.

Returns:

A list of content-part dicts suitable for the content field of an OpenRouter user message.

Return type:

list[dict[str, Any]]