How I write specs when the output is probabilistic

A lightweight PRD pattern for RAG and agents: intents, failure modes, and success metrics that aren’t vanity.

For AI features I still write PRDs — but the non-functional section gets as much love as user stories. I capture latency budgets, allowed versus disallowed topics, retention of transcripts, and what happens when the retriever returns nothing useful.

User stories read like: “As an auditor, I want a first-pass summary with citations so I can verify in the source system” — never “As a user, I want AI magic.” Acceptance criteria reference evaluation scenarios and escalation to human support.

This is the same muscle I used on the BYU–I chatbot: intents and flows first, then retrieval, then guardrails. The spec is the contract between policy, engineering, and the user’s trust.