ADR-0004 — Cortex for interactive sessions; EAI for serverless batch
Status: accepted Date: 2026-06-01
Context
Two cost structures apply to LLM calls in Snowflake:
Cortex AI_COMPLETE → serverless compute, 0 WH required
EAI + Anthropic/Mistral → WH must be active during the HTTP call
Serverless Task + EAI → 0 WH (task is serverless), ~5× cheaper than WH + Cortex
A 30-exchange Streamlit session with 3–8 s per LLM call keeps a WH active for ~3 minutes. At XS WH pricing, that WH cost erases a significant fraction of the token savings that would come from using a cheaper external API.
For batch scoring (overnight, no user waiting), a serverless task + EAI call costs ~5× less than calling Cortex with an active WH.
Decision
backend="cortex"is the default and the only valid backend for interactive Streamlit sessions. No WH is consumed. Data never leaves Snowflake.backend="task"(EAI direct API) is only valid from serverless tasks. It must never be instantiated from a Streamlit app or any context with an active session WH.- Products processing minor data (e.g. PLUTO_SCHOOL) must use
backend="cortex"in all contexts —backend="task"is prohibited regardless of cost savings, because data must not leave Snowflake.
The QUERY_TAG rule applies to both backends: every SP wrapper sets a structured JSON tag
on the session before each AI call. Without it, cost granularity by product/client/session
is lost and SESSION_COST_AVG cannot be populated.
Consequences
- EAI setup (
security_bindings.yml— secrets, network rules, EAI objects) is only required in repos that usebackend="task". - Serverless task SPs must declare
EXTERNAL_ACCESS_INTEGRATIONSandSECRETSexplicitly. SESSION_COST_AVGis refreshed hourly by a serverless task readingINFORMATION_SCHEMA.CORTEX_FUNCTIONS_QUERY_USAGE. Individual session data stays in{client}.SESSION_HISTORY(PII-scoped, dropped at unsubscription).