Writing
Thoughts on AI systems, infrastructure, and stuff I work on from time to time.
Grounding a support agent in real product data
The first time I pointed a raw LLM at our toy store’s support inbox, it told a customer our return window was 90 days. It is 30. My wife caught it before it went out, but the fact that the system generated that answer with complete confidence bothered me for days. The model was not lying, exactly. It was just averaging across thousands of return policies it had seen during training and producing something plausible. For a real store with real customers, plausible is dangerous.
Applying GenAI design patterns to a real project
I have been building AI systems professionally for a few years now, and something that keeps frustrating me is the gap between what works in a demo and what survives in production. Every conference talk shows a chatbot answering questions about a PDF. Every tutorial builds a RAG pipeline that retrieves three documents and calls it done. Meanwhile the actual problems (how do you stop an agent from hallucinating a price, how do you coordinate five agents that share state, how do you generate product descriptions that actually convert) tend to get figured out the hard way.