ncm_local_embeddings

class ncm_local_embeddings.NCMSemanticPreprocessor(expansions=None)[source]

Bases: object

Parameters:

expansions (Dict[str, str])

__init__(expansions=None)[source]

Build a sigil-expansion preprocessor and compile its match pattern.

Seeds the instance with a copy of the module-level BASE_ABBREV_EXPANSIONS map (NCM acronym -> human-readable semantic definition), optionally overlays caller-supplied overrides, and pre-compiles a single word-boundary regex that alternates over every known key so expand() can substitute in one pass.

This constructor performs no I/O; it only mutates self. It is invoked by EnhancedLocalNCMEmbedder.__init__() (which constructs one with no overrides) and may be instantiated directly by any caller needing standalone sigil expansion. No other internal callers were found in the repo.

Parameters:

expansions (Dict[str, str]) – Optional acronym-to-definition pairs that are merged on top of BASE_ABBREV_EXPANSIONS, overriding any base entries with the same key. Defaults to None (base map only).

expand(text)[source]

Replaces NCM sigils/acronyms with their full semantic definitions. Appends the definition in parens rather than replacing to keep original context. Example: “High D1 state” -> “High D1 (dopamine D1 receptor, focused drive…) state”

Return type:

str

Parameters:

text (str)

class ncm_local_embeddings.EnhancedLocalNCMEmbedder(model_name='all-mpnet-base-v2')[source]

Bases: EmbeddingFunction

Parameters:

model_name (str)

__init__(model_name='all-mpnet-base-v2')[source]

Construct the NCM-aware embedding function over a SentenceTransformer.

Wraps ChromaDB’s SentenceTransformerEmbeddingFunction (loading the named local model, e.g. all-mpnet-base-v2) together with an NCMSemanticPreprocessor, so that documents have their NCM sigils expanded before being embedded by the underlying model.

Constructing the SentenceTransformerEmbeddingFunction triggers loading the sentence-transformers model into memory (and may download weights on first use); the preprocessor is built with no overrides. This class implements ChromaDB’s EmbeddingFunction protocol and is intended to be passed to a Chroma collection; no internal callers instantiate it directly (it is listed only as a known local module in scripts/collect_tool_imports.py).

Parameters:

model_name (str) – Name of the SentenceTransformer model to load. Defaults to "all-mpnet-base-v2".

name()[source]

Return the stable identifier for this embedding function.

Satisfies ChromaDB’s EmbeddingFunction protocol; Chroma uses this string to record which embedding function produced a collection’s vectors and to guard against mixing incompatible embedders. It is invoked by ChromaDB internally rather than by any code in this repo.

Returns:

The constant identifier "ncm_enhanced_mpnet".

Return type:

str

__call__(input)[source]

Expand NCM sigils in each document, then embed the expanded text.

Implements the core of ChromaDB’s EmbeddingFunction protocol: every input document is first rewritten by NCMSemanticPreprocessor.expand() (so the model sees e.g. "D1 (dopamine D1 receptor, focused drive...)" instead of a bare "D1"), and the expanded batch is handed to the wrapped SentenceTransformerEmbeddingFunction to produce vectors.

Calls self.preprocessor.expand once per document and then self.ef on the whole expanded list, which runs the local SentenceTransformer model inference. This method is invoked by ChromaDB whenever a collection bound to this embedder adds or queries documents; no code in this repo calls it directly.

Parameters:

input (Documents) – The batch of raw document strings to embed.

Returns:

One embedding vector per input document, in order, as produced by the underlying SentenceTransformer model.

Return type:

Embeddings