ADR-0002 — Single PinkyAI interface across multiple backends
Status: accepted Date: 2026-06-01
Context
Three distinct execution contexts need AI calls:
- Local dev — no Snowflake connection, no API cost, fast iteration (Ollama).
- Interactive Streamlit session — must stay inside Snowflake for GDPR; serverless compute preferred to avoid idle WH cost (Cortex).
- Serverless batch task — no WH available; EAI direct API call is ~5× cheaper than Cortex + WH for non-interactive workloads (EAI → Anthropic/Mistral).
Without an abstraction, application code must branch on environment, and every new context requires a code change in the calling app.
Decision
PinkyAI.__init__ accepts a backend parameter: "cortex" | "ollama" | "groq" | "task".
All public methods (complete, extract, classify, filter, embed, parse_document,
count_tokens) have the same signature regardless of backend.
The active backend is resolved from the PINKY_AI_BACKEND environment variable by the
get_ai(session) factory function. Application code calls ai.complete(...) unchanged
across all environments.
Consequences
backend="task"is only valid inside a serverless task. Calling it from a Streamlit session would spin up no WH but would expose data to external APIs — prohibited for products handling minor data (see ADR-0003).- Local model quality (Ollama
mistral) diverges from prod quality (Cortexmistral-large2). Local mode is for integration testing of the pipeline shape, not for output quality. backend="groq"is a dev-cloud shortcut (fast, cheap) — not for production.- Adding a new backend requires only a new branch in
_call_openai_compat; calling code never changes.