`ai`

Cortex AI wrapper with semantic cache, model routing, and token compression.

`ModelMap` `dataclass`

Cortex model name → local/EAI equivalent lookup.

Field names use underscores; call .get() with the hyphenated Cortex name. The local mapping uses "mistral" (Ollama) as a uniform stand-in regardless of the Cortex model. Local mode is for pipeline integration testing, not output quality.

Source code in src/pinky_ai/ai.py

@dataclass(frozen=True)
class ModelMap:
    """Cortex model name → local/EAI equivalent lookup.

    Field names use underscores; call `.get()` with the hyphenated Cortex name.
    The local mapping uses `"mistral"` (Ollama) as a uniform stand-in regardless
    of the Cortex model. Local mode is for pipeline integration testing, not
    output quality.
    """

    mistral_7b: str = "mistral"
    mistral_large2: str = "mistral"
    claude_haiku_4_5: str = "mistral"
    claude_sonnet_4_6t: str = "mistral"

    def get(self, model: str, default: str = "mistral") -> str:
        """Return the local/EAI equivalent for *model*.

        Hyphens and dots in the Cortex model name are normalised to underscores
        before attribute lookup.

        Args:
            model: Cortex model name, e.g. `"claude-haiku-4-5"`.
            default: Returned when *model* is not a known field.

        Returns:
            Local model name string, e.g. `"mistral"`.
        """
        return getattr(self, model.replace("-", "_").replace(".", "_"), default)

`get(model, default='mistral')`

Return the local/EAI equivalent for model.

Hyphens and dots in the Cortex model name are normalised to underscores before attribute lookup.

Parameters:

Name	Type	Description	Default
`model`	`str`	Cortex model name, e.g. `"claude-haiku-4-5"`.	required
`default`	`str`	Returned when model is not a known field.	`'mistral'`

Returns:

Type	Description
`str`	Local model name string, e.g. `"mistral"`.

Source code in src/pinky_ai/ai.py

def get(self, model: str, default: str = "mistral") -> str:
    """Return the local/EAI equivalent for *model*.

    Hyphens and dots in the Cortex model name are normalised to underscores
    before attribute lookup.

    Args:
        model: Cortex model name, e.g. `"claude-haiku-4-5"`.
        default: Returned when *model* is not a known field.

    Returns:
        Local model name string, e.g. `"mistral"`.
    """
    return getattr(self, model.replace("-", "_").replace(".", "_"), default)

`PinkyAI`

Cortex AI wrapper for Layer 4 cost optimisation.

Intercepts every LLM call with three levers:

Semantic cache — cosine-similarity lookup over AI_EMBED vectors before any Cortex call. Cache stored in DB_{APP}.{client}.AI_CACHE per client schema. Default similarity threshold: 0.92. See ADR-0001.
Model router — resolves the cheapest adequate model from task_type via ROUTING_RULES. See ADR-0003 for sovereignty constraints.
Token compression — summarises the oldest 2/3 of conversation history when input exceeds max_tokens. Reduces input by 60–70% on long sessions.

Backend selection (see ADR-0002 and ADR-0004):

"cortex" (default) — serverless Cortex, 0 WH, data stays in Snowflake. Required for interactive Streamlit sessions and for products handling minor data.
"ollama" — local Ollama server. Zero cost, zero network. Dev only.
"groq" — Groq cloud API. Fast and cheap. Dev-cloud only.
"task" — direct EAI call to Anthropic/Mistral. Only valid from serverless tasks. Prohibited for products handling minor data.

Parameters:

Name	Type	Description	Default
`session`	`Any`	Active Snowpark session. Required for `backend="cortex"`. May be `None` for pure-local backends.	required
`backend`	`str`	Compute backend. Defaults to `"cortex"`.	`'cortex'`
`cache_table`	`str \| None`	Fully-qualified cache table name, e.g. `"DB_PLUTO_SCHOOL.CLIENT_001.AI_CACHE"`. Required when using the semantic cache.	`None`
`similarity_threshold`	`float`	Minimum cosine similarity for a cache hit. Default 0.92.	`0.92`
`**kwargs`	`Any`	Backend-specific options: `base_url` (ollama), `api_key` (groq/task).	`{}`

Example

from pinky_ai import get_ai

ai = get_ai(session)
answer = ai.complete("Explain the Pythagorean theorem", task_type="explanation")

Source code in src/pinky_ai/ai.py

class PinkyAI:
    """Cortex AI wrapper for Layer 4 cost optimisation.

    Intercepts every LLM call with three levers:

    - **Semantic cache** — cosine-similarity lookup over `AI_EMBED` vectors before
      any Cortex call. Cache stored in `DB_{APP}.{client}.AI_CACHE` per client schema.
      Default similarity threshold: 0.92. See ADR-0001.
    - **Model router** — resolves the cheapest adequate model from `task_type` via
      `ROUTING_RULES`. See ADR-0003 for sovereignty constraints.
    - **Token compression** — summarises the oldest 2/3 of conversation history when
      input exceeds `max_tokens`. Reduces input by 60–70% on long sessions.

    Backend selection (see ADR-0002 and ADR-0004):

    - `"cortex"` (default) — serverless Cortex, 0 WH, data stays in Snowflake.
      Required for interactive Streamlit sessions and for products handling minor data.
    - `"ollama"` — local Ollama server. Zero cost, zero network. Dev only.
    - `"groq"` — Groq cloud API. Fast and cheap. Dev-cloud only.
    - `"task"` — direct EAI call to Anthropic/Mistral. Only valid from serverless
      tasks. Prohibited for products handling minor data.

    Args:
        session: Active Snowpark session. Required for `backend="cortex"`.
            May be `None` for pure-local backends.
        backend: Compute backend. Defaults to `"cortex"`.
        cache_table: Fully-qualified cache table name, e.g.
            `"DB_PLUTO_SCHOOL.CLIENT_001.AI_CACHE"`. Required when using the
            semantic cache.
        similarity_threshold: Minimum cosine similarity for a cache hit. Default 0.92.
        **kwargs: Backend-specific options: `base_url` (ollama), `api_key` (groq/task).

    Example:
        ```python
        from pinky_ai import get_ai

        ai = get_ai(session)
        answer = ai.complete("Explain the Pythagorean theorem", task_type="explanation")
        ```
    """

    MODEL_MAP: ClassVar[ModelMap] = ModelMap()

    def __init__(
        self,
        session: Any,
        backend: str = "cortex",
        cache_table: str | None = None,
        similarity_threshold: float = 0.92,
        **kwargs: Any,
    ) -> None: ...

    def complete(
        self,
        prompt: str,
        model: str = "mistral-7b",
        task_type: str | None = None,
        **kwargs: Any,
    ) -> str:
        """Call COMPLETE with semantic cache lookup and optional model routing.

        If `task_type` is provided, the model is resolved from `ROUTING_RULES`
        instead of the `model` argument.

        Cache hit path (0 tokens consumed):
        1. Embed the prompt with `AI_EMBED`.
        2. Query `cache_table` for cosine similarity ≥ `similarity_threshold`.
        3. Return stored response if found.

        Cache miss path:
        1. Call Cortex (or the active backend).
        2. Store (embedding, response, model) in `cache_table`.
        3. Return the response.

        Args:
            prompt: User prompt.
            model: Cortex model name. Ignored when `task_type` is set.
            task_type: Routing key resolved against `ROUTING_RULES`. One of
                `"factual_simple"`, `"explanation"`, `"complex_reason"`,
                `"scoring_batch"`.
            **kwargs: Forwarded to the active backend.

        Returns:
            Model response text.
        """
        ...

    def extract(self, text: str, schema: dict[str, Any]) -> dict[str, Any]:
        """Extract structured fields from text using `AI_EXTRACT`.

        Args:
            text: Source text to extract from.
            schema: Target JSON schema describing fields to extract.

        Returns:
            Dict matching the provided schema.
        """
        ...

    def classify(self, text: str, categories: list[str]) -> str:
        """Classify text into one of the provided categories using `AI_CLASSIFY`.

        Args:
            text: Text to classify.
            categories: List of candidate category labels.

        Returns:
            The most probable category label.
        """
        ...

    def filter(self, text: str, condition: str) -> bool:  # noqa: A003
        """Check whether text satisfies a natural-language condition using `AI_FILTER`.

        Intended for guardrails: input validation, output quality checks.

        Args:
            text: Text to evaluate.
            condition: Natural-language condition to test against.

        Returns:
            `True` if the text satisfies the condition.
        """
        ...

    def embed(self, text: str) -> list[float]:
        """Embed text into a 768-dimension vector using `AI_EMBED`.

        Uses `e5-base-v2` in Cortex. Falls back to `nomic-embed-text` via Ollama
        in local mode. The embedding is consumed by the semantic cache layer.

        Args:
            text: Text to embed.

        Returns:
            List of 768 floats.
        """
        ...

    def parse_document(self, file_url: str) -> str:
        """Extract text from a document using `AI_PARSE_DOCUMENT` (OCR).

        Args:
            file_url: Snowflake stage URL or presigned URL pointing to the document.

        Returns:
            Extracted text with layout hints (`LAYOUT` mode).
        """
        ...

    def count_tokens(self, text: str, model: str = "mistral-7b") -> int:
        """Count tokens in text using `AI_COUNT_TOKENS`.

        Used for pre-call cost estimation and context window management.
        Approximates to `len(text) // 4` when backend is not `"cortex"`.

        Args:
            text: Input text.
            model: Model to use for token counting (tokenisation is model-specific).

        Returns:
            Estimated token count.
        """
        ...

`classify(text, categories)`

Classify text into one of the provided categories using AI_CLASSIFY.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to classify.	required
`categories`	`list[str]`	List of candidate category labels.	required

Returns:

Type	Description
`str`	The most probable category label.

Source code in src/pinky_ai/ai.py

def classify(self, text: str, categories: list[str]) -> str:
    """Classify text into one of the provided categories using `AI_CLASSIFY`.

    Args:
        text: Text to classify.
        categories: List of candidate category labels.

    Returns:
        The most probable category label.
    """
    ...

`complete(prompt, model='mistral-7b', task_type=None, **kwargs)`

Call COMPLETE with semantic cache lookup and optional model routing.

If task_type is provided, the model is resolved from ROUTING_RULES instead of the model argument.

Cache hit path (0 tokens consumed): 1. Embed the prompt with AI_EMBED. 2. Query cache_table for cosine similarity ≥ similarity_threshold. 3. Return stored response if found.

Cache miss path: 1. Call Cortex (or the active backend). 2. Store (embedding, response, model) in cache_table. 3. Return the response.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	User prompt.	required
`model`	`str`	Cortex model name. Ignored when `task_type` is set.	`'mistral-7b'`
`task_type`	`str \| None`	Routing key resolved against `ROUTING_RULES`. One of `"factual_simple"`, `"explanation"`, `"complex_reason"`, `"scoring_batch"`.	`None`
`**kwargs`	`Any`	Forwarded to the active backend.	`{}`

Returns:

Type	Description
`str`	Model response text.

Source code in src/pinky_ai/ai.py

def complete(
    self,
    prompt: str,
    model: str = "mistral-7b",
    task_type: str | None = None,
    **kwargs: Any,
) -> str:
    """Call COMPLETE with semantic cache lookup and optional model routing.

    If `task_type` is provided, the model is resolved from `ROUTING_RULES`
    instead of the `model` argument.

    Cache hit path (0 tokens consumed):
    1. Embed the prompt with `AI_EMBED`.
    2. Query `cache_table` for cosine similarity ≥ `similarity_threshold`.
    3. Return stored response if found.

    Cache miss path:
    1. Call Cortex (or the active backend).
    2. Store (embedding, response, model) in `cache_table`.
    3. Return the response.

    Args:
        prompt: User prompt.
        model: Cortex model name. Ignored when `task_type` is set.
        task_type: Routing key resolved against `ROUTING_RULES`. One of
            `"factual_simple"`, `"explanation"`, `"complex_reason"`,
            `"scoring_batch"`.
        **kwargs: Forwarded to the active backend.

    Returns:
        Model response text.
    """
    ...

`count_tokens(text, model='mistral-7b')`

Count tokens in text using AI_COUNT_TOKENS.

Used for pre-call cost estimation and context window management. Approximates to len(text) // 4 when backend is not "cortex".

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text.	required
`model`	`str`	Model to use for token counting (tokenisation is model-specific).	`'mistral-7b'`

Returns:

Type	Description
`int`	Estimated token count.

Source code in src/pinky_ai/ai.py

def count_tokens(self, text: str, model: str = "mistral-7b") -> int:
    """Count tokens in text using `AI_COUNT_TOKENS`.

    Used for pre-call cost estimation and context window management.
    Approximates to `len(text) // 4` when backend is not `"cortex"`.

    Args:
        text: Input text.
        model: Model to use for token counting (tokenisation is model-specific).

    Returns:
        Estimated token count.
    """
    ...

`embed(text)`

Embed text into a 768-dimension vector using AI_EMBED.

Uses e5-base-v2 in Cortex. Falls back to nomic-embed-text via Ollama in local mode. The embedding is consumed by the semantic cache layer.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to embed.	required

Returns:

Type	Description
`list[float]`	List of 768 floats.

Source code in src/pinky_ai/ai.py

def embed(self, text: str) -> list[float]:
    """Embed text into a 768-dimension vector using `AI_EMBED`.

    Uses `e5-base-v2` in Cortex. Falls back to `nomic-embed-text` via Ollama
    in local mode. The embedding is consumed by the semantic cache layer.

    Args:
        text: Text to embed.

    Returns:
        List of 768 floats.
    """
    ...

`extract(text, schema)`

Extract structured fields from text using AI_EXTRACT.

Parameters:

Name	Type	Description	Default
`text`	`str`	Source text to extract from.	required
`schema`	`dict[str, Any]`	Target JSON schema describing fields to extract.	required

Returns:

Type	Description
`dict[str, Any]`	Dict matching the provided schema.

Source code in src/pinky_ai/ai.py

def extract(self, text: str, schema: dict[str, Any]) -> dict[str, Any]:
    """Extract structured fields from text using `AI_EXTRACT`.

    Args:
        text: Source text to extract from.
        schema: Target JSON schema describing fields to extract.

    Returns:
        Dict matching the provided schema.
    """
    ...

`filter(text, condition)`

Check whether text satisfies a natural-language condition using AI_FILTER.

Intended for guardrails: input validation, output quality checks.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to evaluate.	required
`condition`	`str`	Natural-language condition to test against.	required

Returns:

Type	Description
`bool`	`True` if the text satisfies the condition.

Source code in src/pinky_ai/ai.py

def filter(self, text: str, condition: str) -> bool:  # noqa: A003
    """Check whether text satisfies a natural-language condition using `AI_FILTER`.

    Intended for guardrails: input validation, output quality checks.

    Args:
        text: Text to evaluate.
        condition: Natural-language condition to test against.

    Returns:
        `True` if the text satisfies the condition.
    """
    ...

`parse_document(file_url)`

Extract text from a document using AI_PARSE_DOCUMENT (OCR).

Parameters:

Name	Type	Description	Default
`file_url`	`str`	Snowflake stage URL or presigned URL pointing to the document.	required

Returns:

Type	Description
`str`	Extracted text with layout hints (`LAYOUT` mode).

Source code in src/pinky_ai/ai.py

def parse_document(self, file_url: str) -> str:
    """Extract text from a document using `AI_PARSE_DOCUMENT` (OCR).

    Args:
        file_url: Snowflake stage URL or presigned URL pointing to the document.

    Returns:
        Extracted text with layout hints (`LAYOUT` mode).
    """
    ...

`RoutingRules` `dataclass`

Task type → Cortex model routing table.

Frozen dataclass — safe to use as a class-level constant. Override at instantiation time for products with different quality/cost needs. See ADR-0003 for model sovereignty rationale (Mistral EU-first, no Meta for minor data).

Example

custom = RoutingRules(factual_simple="mistral-large2")
ai = PinkyAI(session, routing=custom)

Source code in src/pinky_ai/ai.py

@dataclass(frozen=True)
class RoutingRules:
    """Task type → Cortex model routing table.

    Frozen dataclass — safe to use as a class-level constant.
    Override at instantiation time for products with different quality/cost needs.
    See ADR-0003 for model sovereignty rationale (Mistral EU-first, no Meta for minor data).

    Example:
        ```python
        custom = RoutingRules(factual_simple="mistral-large2")
        ai = PinkyAI(session, routing=custom)
        ```
    """

    factual_simple: str = "mistral-7b"
    explanation: str = "claude-haiku-4-5"
    complex_reason: str = "claude-sonnet-4-6t"
    scoring_batch: str = "mistral-large2"

    def resolve(self, task_type: str, default: str = "claude-haiku-4-5") -> str:
        """Return the Cortex model for *task_type*, falling back to *default*.

        Args:
            task_type: One of `"factual_simple"`, `"explanation"`,
                `"complex_reason"`, `"scoring_batch"`.
            default: Model returned when *task_type* is not a known field.

        Returns:
            Cortex model name string.
        """
        return getattr(self, task_type, default)

`resolve(task_type, default='claude-haiku-4-5')`

Return the Cortex model for task_type, falling back to default.

Parameters:

Name	Type	Description	Default
`task_type`	`str`	One of `"factual_simple"`, `"explanation"`, `"complex_reason"`, `"scoring_batch"`.	required
`default`	`str`	Model returned when task_type is not a known field.	`'claude-haiku-4-5'`

Returns:

Type	Description
`str`	Cortex model name string.

Source code in src/pinky_ai/ai.py

def resolve(self, task_type: str, default: str = "claude-haiku-4-5") -> str:
    """Return the Cortex model for *task_type*, falling back to *default*.

    Args:
        task_type: One of `"factual_simple"`, `"explanation"`,
            `"complex_reason"`, `"scoring_batch"`.
        default: Model returned when *task_type* is not a known field.

    Returns:
        Cortex model name string.
    """
    return getattr(self, task_type, default)

`get_ai(session)`

Resolve the active backend from environment variables and return a PinkyAI instance.

Environment variables:

PINKY_AI_BACKEND — backend selector. Default: "cortex".
PINKY_AI_URL — base URL for Ollama. Default: "http://localhost:11434".
PINKY_AI_KEY — API key for Groq or EAI task backend. Default: empty string.

Parameters:

Name	Type	Description	Default
`session`	`Any`	Active Snowpark session (required for `backend="cortex"`).	required

Returns:

Type	Description
`PinkyAI`	Configured `PinkyAI` instance.

Example

# .env dev
# PINKY_AI_BACKEND=ollama
# PINKY_AI_URL=http://localhost:11434

ai = get_ai(session)  # backend resolved from env

Source code in src/pinky_ai/ai.py

def get_ai(session: Any) -> PinkyAI:
    """Resolve the active backend from environment variables and return a `PinkyAI` instance.

    Environment variables:

    - `PINKY_AI_BACKEND` — backend selector. Default: `"cortex"`.
    - `PINKY_AI_URL` — base URL for Ollama. Default: `"http://localhost:11434"`.
    - `PINKY_AI_KEY` — API key for Groq or EAI task backend. Default: empty string.

    Args:
        session: Active Snowpark session (required for `backend="cortex"`).

    Returns:
        Configured `PinkyAI` instance.

    Example:
        ```python
        # .env dev
        # PINKY_AI_BACKEND=ollama
        # PINKY_AI_URL=http://localhost:11434

        ai = get_ai(session)  # backend resolved from env
        ```
    """
    backend = os.getenv("PINKY_AI_BACKEND", "cortex")
    return PinkyAI(
        session,
        backend=backend,
        base_url=os.getenv("PINKY_AI_URL", "http://localhost:11434"),
        api_key=os.getenv("PINKY_AI_KEY", ""),
    )

ai

ModelMap dataclass

get(model, default='mistral')

PinkyAI

classify(text, categories)

complete(prompt, model='mistral-7b', task_type=None, **kwargs)

count_tokens(text, model='mistral-7b')

embed(text)

extract(text, schema)

filter(text, condition)

parse_document(file_url)

RoutingRules dataclass

resolve(task_type, default='claude-haiku-4-5')

get_ai(session)

`ai`

`ModelMap` `dataclass`

`get(model, default='mistral')`

`PinkyAI`

`classify(text, categories)`

`complete(prompt, model='mistral-7b', task_type=None, **kwargs)`

`count_tokens(text, model='mistral-7b')`

`embed(text)`

`extract(text, schema)`

`filter(text, condition)`

`parse_document(file_url)`

`RoutingRules` `dataclass`

`resolve(task_type, default='claude-haiku-4-5')`

`get_ai(session)`