media_cache

Disk-backed LRU media cache.

Caches downloaded media (images, audio, video, files) so that the same URL is never fetched twice. An in-memory index provides fast lookups while a configurable disk directory persists data across restarts.

Each cached entry is stored on disk as two files:

{sha256_of_url}.dat   – raw media bytes
{sha256_of_url}.json  – sidecar metadata (mimetype, filename, url, ts, size)

On startup the disk directory is scanned to rebuild the in-memory index without loading all bytes into RAM.

class media_cache.MediaCache(cache_dir='media_cache', max_size_mb=500, max_memory_items=64)[source]

Bases: object

Two-tier (memory + disk) LRU media cache.

Parameters:
  • cache_dir (str | Path) – Directory for persistent storage. Created automatically.

  • max_size_mb (int) – Approximate cap on total disk usage in megabytes. Oldest entries are evicted when the limit is exceeded.

  • max_memory_items (int) – Maximum number of entries whose bytes are kept in RAM. Entries beyond this limit are still indexed (metadata only) and will be read back from disk on the next access.

__init__(cache_dir='media_cache', max_size_mb=500, max_memory_items=64)[source]

Initialize the instance.

Parameters:
  • cache_dir (str | Path) – The cache dir value.

  • max_size_mb (int) – The max size mb value.

  • max_memory_items (int) – The max memory items value.

Return type:

None

async ensure_loaded()[source]

Load the in-memory index from disk (non-blocking).

Called during async startup so the sync disk scan does not block the event loop. Idempotent — safe to call multiple times.

Return type:

None

async get(url)[source]

Return (data, mimetype, filename) if url is cached, else None.

Return type:

tuple[bytes, str, str] | None

Parameters:

url (str)

async put(url, data, mimetype, filename)[source]

Store media bytes under url, writing through to disk.

Return type:

None

Parameters:
async get_or_download(url, downloader)[source]

Return cached media or call downloader and cache the result.

downloader is an async callable returning (data, mimetype, filename).

GIF images are automatically re-encoded as MP4 before being stored so that the cache always contains the video format.

Return type:

tuple[bytes, str, str]

Parameters:
stats()[source]

Return a snapshot of cache statistics.

Return type:

dict[str, Any]