Evals for a RAG bot: a hands-on loop for conversation designers
A field-tested workflow for catching hallucinations, citation drift, and unsafe outputs on a Langfuse-traced bot, written for designers who have never run an eval before.
Ideas waiting to germinate.
A field-tested workflow for catching hallucinations, citation drift, and unsafe outputs on a Langfuse-traced bot, written for designers who have never run an eval before.
Not the idiom. A recursive engineering discipline: A-activity does the work, B-activity improves how A is done, C-activity improves how B is done. The leverage is at C, and almost nobody runs it.
Engelbart's central concept: the capacity of a group, organization, or society to address complex urgent problems. The thing he kept naming, and the thing the industry kept losing.
Engelbart's 1962 framework. Augmentation isn't the tool alone, it is Human + Language + Artifacts + Methodology + Training, co-evolving. UX has spent forty years polishing the Artifact and ignoring the other three.
Engelbart said scaling is a science. Dimensional scaling applies to organisations, to human-plus-agent collectives, and to the problems we are trying to solve. Each regime needs the form redesigned.
LLMs solved the language-translation half of knowledge interop. The verification half is largely unbuilt: provenance, claim attribution, signed assertions, grounding evals, shared verification standards.
Engelbart designed for groups, not individuals. H-LAM/T and the Networked Improvement Community suggest the unit of design isn't user plus machine but user plus user plus machine. A reminder for the Guildford talk.
Running scratchpad for the Guildford conference talk. Ideas as they arrive.
The next step beyond conversation-driven interaction: using Claude Code to turn method documents into working applications and dashboards.
Reading notes on cognitive assemblages, anthropocentrism, and the case for extending cognition beyond human consciousness, from bacteria to AI.
Some kind of reading log for Bacteria to AI, and perhaps even a reminder to actually read.
Schegloff mapped conversational patterns for human-human interaction. We need new ones for human-machine interaction, and conflating the two harms both.
Being PO, scrum master, prompt designer, tester, eval writer and UX designer in a team of 3 means rethinking roles and processes on a daily, almost hourly basis.
Claude Code is reintroducing structured, constrained input: buttons, numbered options, limited choices. Which is actually IVR. And that might be exactly right.
What if we modeled human-AI interaction on thematic analysis rather than conversation? And what if we used that same method to discover themes in a knowledge garden?
Can a chatbot-style UI work without generative AI, using decision trees, hyperlinking, and serendipity instead?
Standard quantization damages non-English languages disproportionately. A Dutch-first approach to model compression shows there's a better way.
Academic research on embeddings hasn't caught up with how PKM tools actually use them. There might be something worth writing here.
Should a digital garden be searchable and useful, or is there beauty in letting people explore?
Language rules for how I talk about AI in my work.
When did we stop caring about who actually wrote the words?
A mechanism that catches your train of thought. Something that responds in another space.
What's the right metaphor for AI's role in my writing process?
How should I indicate when content was made with AI, and share the prompts?
What if I wrote a comprehensive guide on how I built this site?
What design principles shaped this site, and why do they matter?