RAG Failure Maps
A retrieval-first exploration of where RAG systems break, from weak chunk boundaries and noisy evidence to grounding failures and misleading answer confidence.
Read experimentA working set of experiments in retrieval, agentic workflows, evaluation, and tooling - shorter, more exploratory, and more technical than the main case studies.
The Lab is where I explore systems before they become full case studies. These entries are faster to scan than the main Projects page and more focused on technical patterns, failure modes, and workflow design. The emphasis is not polish for its own sake, but learning through disciplined experimentation.
A retrieval-first exploration of where RAG systems break, from weak chunk boundaries and noisy evidence to grounding failures and misleading answer confidence.
Read experimentExperiments in multi-step extraction workflows that combine tool-calling, structured outputs, and evidence checks to make extraction more reliable than one-shot prompting.
Read experimentStructured DSPy-style experiments for improving retrieval and answer quality through optimization, evaluation-aware prompting, and tighter system feedback loops.
Read experimentA command-line workflow for fast AI-assisted research and document tasks, designed to make local experimentation, iteration, and tool chaining more usable in serious technical work.
Read experiment