ncm_local_embeddings
- class ncm_local_embeddings.NCMSemanticPreprocessor(expansions=None)[source]
Bases:
object- __init__(expansions=None)[source]
Build a sigil-expansion preprocessor and compile its match pattern.
Seeds the instance with a copy of the module-level
BASE_ABBREV_EXPANSIONSmap (NCM acronym -> human-readable semantic definition), optionally overlays caller-supplied overrides, and pre-compiles a single word-boundary regex that alternates over every known key soexpand()can substitute in one pass.This constructor performs no I/O; it only mutates
self. It is invoked byEnhancedLocalNCMEmbedder.__init__()(which constructs one with no overrides) and may be instantiated directly by any caller needing standalone sigil expansion. No other internal callers were found in the repo.
- class ncm_local_embeddings.EnhancedLocalNCMEmbedder(model_name='all-mpnet-base-v2')[source]
Bases:
EmbeddingFunction- Parameters:
model_name (str)
- __init__(model_name='all-mpnet-base-v2')[source]
Construct the NCM-aware embedding function over a SentenceTransformer.
Wraps ChromaDB’s
SentenceTransformerEmbeddingFunction(loading the named local model, e.g.all-mpnet-base-v2) together with anNCMSemanticPreprocessor, so that documents have their NCM sigils expanded before being embedded by the underlying model.Constructing the
SentenceTransformerEmbeddingFunctiontriggers loading the sentence-transformers model into memory (and may download weights on first use); the preprocessor is built with no overrides. This class implements ChromaDB’sEmbeddingFunctionprotocol and is intended to be passed to a Chroma collection; no internal callers instantiate it directly (it is listed only as a known local module inscripts/collect_tool_imports.py).- Parameters:
model_name (
str) – Name of the SentenceTransformer model to load. Defaults to"all-mpnet-base-v2".
- name()[source]
Return the stable identifier for this embedding function.
Satisfies ChromaDB’s
EmbeddingFunctionprotocol; Chroma uses this string to record which embedding function produced a collection’s vectors and to guard against mixing incompatible embedders. It is invoked by ChromaDB internally rather than by any code in this repo.- Returns:
The constant identifier
"ncm_enhanced_mpnet".- Return type:
- __call__(input)[source]
Expand NCM sigils in each document, then embed the expanded text.
Implements the core of ChromaDB’s
EmbeddingFunctionprotocol: every input document is first rewritten byNCMSemanticPreprocessor.expand()(so the model sees e.g."D1 (dopamine D1 receptor, focused drive...)"instead of a bare"D1"), and the expanded batch is handed to the wrappedSentenceTransformerEmbeddingFunctionto produce vectors.Calls
self.preprocessor.expandonce per document and thenself.efon the whole expanded list, which runs the local SentenceTransformer model inference. This method is invoked by ChromaDB whenever a collection bound to this embedder adds or queries documents; no code in this repo calls it directly.- Parameters:
input (
Documents) – The batch of raw document strings to embed.- Returns:
One embedding vector per input document, in order, as produced by the underlying SentenceTransformer model.
- Return type:
Embeddings