ADR-0002 — Single PinkyAI interface across multiple backends

Status: accepted Date: 2026-06-01

Context

Three distinct execution contexts need AI calls:

Local dev — no Snowflake connection, no API cost, fast iteration (Ollama).
Interactive Streamlit session — must stay inside Snowflake for GDPR; serverless compute preferred to avoid idle WH cost (Cortex).
Serverless batch task — no WH available; EAI direct API call is ~5× cheaper than Cortex + WH for non-interactive workloads (EAI → Anthropic/Mistral).

Without an abstraction, application code must branch on environment, and every new context requires a code change in the calling app.

Decision

PinkyAI.__init__ accepts a backend parameter: "cortex" | "ollama" | "groq" | "task". All public methods (complete, extract, classify, filter, embed, parse_document, count_tokens) have the same signature regardless of backend.

The active backend is resolved from the PINKY_AI_BACKEND environment variable by the get_ai(session) factory function. Application code calls ai.complete(...) unchanged across all environments.

Consequences

backend="task" is only valid inside a serverless task. Calling it from a Streamlit session would spin up no WH but would expose data to external APIs — prohibited for products handling minor data (see ADR-0003).
Local model quality (Ollama mistral) diverges from prod quality (Cortex mistral-large2). Local mode is for integration testing of the pipeline shape, not for output quality.
backend="groq" is a dev-cloud shortcut (fast, cheap) — not for production.
Adding a new backend requires only a new branch in _call_openai_compat; calling code never changes.