ADR-0004 — Cortex for interactive sessions; EAI for serverless batch

Status: accepted Date: 2026-06-01

Context

Two cost structures apply to LLM calls in Snowflake:

Cortex AI_COMPLETE    →  serverless compute, 0 WH required
EAI + Anthropic/Mistral →  WH must be active during the HTTP call
Serverless Task + EAI   →  0 WH (task is serverless), ~5× cheaper than WH + Cortex

A 30-exchange Streamlit session with 3–8 s per LLM call keeps a WH active for ~3 minutes. At XS WH pricing, that WH cost erases a significant fraction of the token savings that would come from using a cheaper external API.

For batch scoring (overnight, no user waiting), a serverless task + EAI call costs ~5× less than calling Cortex with an active WH.

Decision

backend="cortex" is the default and the only valid backend for interactive Streamlit sessions. No WH is consumed. Data never leaves Snowflake.
backend="task" (EAI direct API) is only valid from serverless tasks. It must never be instantiated from a Streamlit app or any context with an active session WH.
Products processing minor data (e.g. PLUTO_SCHOOL) must use backend="cortex" in all contexts — backend="task" is prohibited regardless of cost savings, because data must not leave Snowflake.

The QUERY_TAG rule applies to both backends: every SP wrapper sets a structured JSON tag on the session before each AI call. Without it, cost granularity by product/client/session is lost and SESSION_COST_AVG cannot be populated.

Consequences

EAI setup (security_bindings.yml — secrets, network rules, EAI objects) is only required in repos that use backend="task".
Serverless task SPs must declare EXTERNAL_ACCESS_INTEGRATIONS and SECRETS explicitly.
SESSION_COST_AVG is refreshed hourly by a serverless task reading INFORMATION_SCHEMA.CORTEX_FUNCTIONS_QUERY_USAGE. Individual session data stays in {client}.SESSION_HISTORY (PII-scoped, dropped at unsubscription).