ProjectsCase study
AuditAI — multi-agent automation at nonprofit scale
IT Project Manager (Jan–May 2026) · Information and Communication Services — global nonprofit · Riverton, UT
- Scale context
- 50,000+ volunteer auditors · 31,000+ units on modernized AWS platform
- What shipped
- Six operational internal AI agents (agentic workflows + human review gates)
- Governance
- Aligned Auditing, engineering & General Counsel on compliant releases
- Measured impact (scoped)
- ~30% automation of common processes · ~50% manual labor reduction on targeted tasks
Problem
Audit teams were juggling legacy modernization on AWS with rising demand for intelligent assistance. Manual steps in repeatable workflows consumed capacity that should have gone to judgment-heavy review — and any AI assist had to survive scrutiny from Auditing and legal stakeholders, not just engineering demos.
Discovery
I partnered with audit leadership to map high-volume paths where probabilistic assistance could add value without bypassing policy. We prioritized workflows with clear inputs, verifiable outputs, and explicit escalation when confidence or policy fit was unclear — the same pattern I later applied to campus RAG: define “good,” then design fallbacks.
What we built
Six internal agents covered specialized steps — intake, classification, drafting support, routing, and similar — orchestrated so outputs fed human reviewers with context instead of replacing them. Tool use and handoffs were designed explicitly (multi-agent orchestration), not as a single monolithic prompt.
Trust & compliance: We coordinated with General Counsel on release criteria, logging, and parallel modernization tracks so AI capability shipped on the same 2-week cadence as other production work — not as a side experiment.
Outcomes
On the workflows we measured with audit leadership, we saw roughly 30% automation of common processes and ~50% reduction in manual labor time. Those metrics are only meaningful because the denominators were agreed with domain owners — something I highlight in interviews when discussing responsible AI PM work.
What I’d do next
Expand evaluation harnesses (golden sets per agent), tighten observability for regression after model or policy changes, and publish clearer operator playbooks for escalation — the product surface around AI, not just the models.