media_cache
Disk-backed LRU media cache.
Caches downloaded media (images, audio, video, files) so that the same URL is never fetched twice. An in-memory index provides fast lookups while a configurable disk directory persists data across restarts.
Each cached entry is stored on disk as two files:
{sha256_of_url}.dat – raw media bytes
{sha256_of_url}.json – sidecar metadata (mimetype, filename, url, ts, size)
On startup the disk directory is scanned to rebuild the in-memory index without loading all bytes into RAM.
- class media_cache.MediaCache(cache_dir='media_cache', max_size_mb=500, max_memory_items=64)[source]
Bases:
objectTwo-tier (memory + disk) LRU media cache.
- Parameters:
cache_dir (
str|Path) – Directory for persistent storage. Created automatically.max_size_mb (
int) – Approximate cap on total disk usage in megabytes. Oldest entries are evicted when the limit is exceeded.max_memory_items (
int) – Maximum number of entries whose bytes are kept in RAM. Entries beyond this limit are still indexed (metadata only) and will be read back from disk on the next access.
- __init__(cache_dir='media_cache', max_size_mb=500, max_memory_items=64)[source]
Set up an empty cache rooted at cache_dir and create the directory.
Configures the disk-byte budget and in-memory item cap, allocates the LRU index (an
OrderedDictkeyed by URL), the hit/miss counters, and theasyncio.Lockthat serializes index mutations. The actual disk scan is deferred toensure_loaded()so that constructing aMediaCachenever blocks the event loop; only themkdir(touching the filesystem) happens here.Constructed by the gateway and web services from configured values — see
gateway_main.pyandweb_main.py, both of which passcfg.media_cache_dirandcfg.media_cache_max_mb.- Parameters:
cache_dir (
str|Path) – Directory used for persistent storage of the.dat/.jsonfiles; created automatically.max_size_mb (
int) – Approximate cap on total on-disk usage in megabytes; oldest entries are evicted once exceeded.max_memory_items (
int) – Maximum number of entries whose raw bytes are kept resident in RAM. Excess entries stay indexed by metadata only and reload from disk on next access.
- Return type:
None
- async ensure_loaded()[source]
Load the in-memory index from disk (non-blocking).
Called during async startup so the sync disk scan does not block the event loop. Idempotent — safe to call multiple times.
- Return type:
- async get(url)[source]
Look up url in the cache and return its media triple, or
Noneon a miss.On a hit the entry is promoted to most-recently-used and its
last_accesstimestamp refreshed so the LRU ordering stays accurate. If the entry’s bytes are not resident in RAM (it was loaded by a disk scan or shed by the memory budget) they are re-read from disk via_read_disk(); should that file have vanished the stale index entry is evicted and a miss is reported. Increments the hit counter and logs on success. All work happens under the sharedasyncio.Lock.Reached indirectly through
get_or_download(), which is the entry point used by the platform adapters (platforms/discord.py,platforms/matrix.py,platforms/discord_self.py,platforms/emoji_resolver.py).
- async put(url, data, mimetype, filename)[source]
Insert media bytes for url, writing through to disk and enforcing limits.
If the URL is already indexed this only bumps its LRU position and access time (the existing bytes are kept). For a new entry it derives the disk key, persists the bytes plus a JSON metadata sidecar via
_write_disk(), records the entry in the in-memory index, then evicts the oldest entries while over the disk budget (_enforce_limits()) and sheds resident bytes for entries past the memory cap (_shed_memory()). Touches the filesystem and runs under the sharedasyncio.Lock.Called by
get_or_download()after a successful, non-empty download; not invoked directly elsewhere in the repo.
- async get_or_download(url, downloader)[source]
Return cached media or call downloader and cache the result.
downloader is an
asynccallable returning(data, mimetype, filename).GIF images are automatically re-encoded as MP4 before being stored so that the cache always contains the video format.
- stats()[source]
Return a snapshot of cache statistics for monitoring.
Reports the total and in-memory entry counts, on-disk byte and megabyte totals, the configured size cap, the running hit/miss counters, and a derived hit rate. A pure read of in-memory state — it acquires no lock and touches neither disk nor network, so it is cheap and safe to poll from an admin endpoint.
Called by the bot admin status handler in
web/bot_admin.py, which exposes the result under themedia_cachekey of its JSON response.