Watch notes: Doug Engelbart at Accelerating Change 2004

Watching Engelbart’s keynote “Large-Scale Collective IQ: Facilitating its Evolution” at the Accelerating Change 2004 conference, Stanford, 5 November 2004. Thirty-six years after the demo, asking: what did we keep, what did we lose, and what’s next.

Running notes below.

Bootstrap institute

is englebar still alive?

Co-created with AI

No. Doug Engelbart died on 2 July 2013, age 88, in Atherton, California (kidney failure). Born 30 January 1925.

So this 2004 keynote is from roughly the last decade of his active work. The Bootstrap Institute (later Doug Engelbart Institute, still running today under his daughter Christina Engelbart) was his vehicle in those years for keeping the augmentation-of-collective-IQ thread alive after NLS and Augment had been commercially overrun.

So he never witnessed LLMs

Co-created with AI

Correct. He died nine years before ChatGPT (November 2022) and four years before the Attention Is All You Need transformer paper (2017). He never saw a single LLM.

Which makes the H-LAM/T frame all the sharper: he was already arguing in 1962 that Language had to be co-designed with the Artifact. He just meant formal command languages and structured representations. LLMs make Language the actual substrate of the artifact (the model is largely linguistic). His framework holds with almost no edits, which is unusual for a 64-year-old conceptual structure.

Most of the conversation-design field is still catching up to a frame Engelbart wrote before any of us were born.

And how do you mean that last sentence?

Co-created with AI

Conversation design as a field has mostly treated:

The bot as the artifact, with language as decoration on top

The user as one individual with one task

The dialogue as the whole interface (the artifact is the dialogue)

“Training” as onboarding flows, not capability development

What Engelbart was already saying in 1962:

Language is a load-bearing system component, equal to the Artifact, the Methodology, and the Training

The unit of design is the group (NIC), not the individual

The dialogue is one quarter of an augmentation system, not the system itself

The goal is collective intellect compounding over years, not a one-off task completion

So when I say “still catching up”: almost all conversation-design practice still optimises the chat surface as if it were the whole product. The H-LAM/T frame demands you also design the language community uses around the bot, the methodologies that combine the bot with human work, and the training that grows mastery. Three quarters of the design surface is missing from the standard playbook.

The provocation for the Guildford talk is right there: name the missing three quarters.

What was his epiphany in the keynote again?

Co-created with AI

He typically recounts the same one across talks: the 1951 spring epiphany, almost certainly retold during the 2004 keynote.

Spring 1951. He was 25, working at NACA (predecessor to NASA) at Ames in Mountain View, recently engaged.

Driving to work, he asked himself what his career should be. Concluded: “if I could find a way to significantly contribute to how humans handle complexity and urgency, that would be universally helpful.” That was the framing.

A few months later, the image: a person sitting at a large screen full of symbols, manipulating them, with computers as an interactive surface for thinking, not a number-cruncher. The screen image came directly from the radar consoles he had operated in the Navy during WWII.

From that came a 20-year research programme, leading to ARC, NLS, and the 1968 demo.

The 2004 keynote uses this story as the origin and then asks the harder question: why, fifty-three years on, has the collective-IQ part of the vision not landed even though the individual-productivity part has? That is where the keynote title “Facilitating the Evolution of our Collective IQ” comes from.

Source: Smithsonian’s Engelbart profile and the canonical retelling in Landau & Clegg, The Engelbart Hypothesis.

Collective capability

Co-created with AI

“Collective capability” is an upgrade on “collective IQ”:

IQ is measurement-laden, with psychometric baggage and a contested history. Capability is action-oriented: what the group can do, not how smart it is.

Serious tradition to lean on: Amartya Sen’s capability approach in development economics, picked up by Martha Nussbaum. Built around exactly this distinction: capabilities (real freedoms to act) over outcomes. Forty years of philosophical scaffolding to borrow.

Maps cleanly to H-LAM/T: each of L, A, M, T is a capability lever; the system as a whole is the collective capability.

capabilities become a key, central consideration

Co-created with AI

Capability as central design consideration reframes the whole question:

Conversation design asks: what dialogue happens this turn?

Feature design asks: what does the artifact let you do now?

Capability design asks: what can the team newly do six months from now because they use this?

Capability is the long-time-horizon view of design. Closer to pedagogy than to product. The Engelbart move is to put that horizon at the centre and let everything else fall out from it: which artifacts, which language, which methodologies, which training.

Worth naming: this is also why most “AI productivity” framing misses the point. Productivity is a measure of speed at existing capability. Augmentation is the expansion of capability. Different axis.

Big issue: paradigms. Computers are just a surface activity, we don’t foresee to do anything about their design.

Capability is an infrastructure, dependent on other capabilities

OMG Engelbart mentions tool systems and human systems

70s mandate: computers had to be easy to run. No, not true: knowledge workers in the future are going to work differently.

What do we need to do to really get capable? Capability is the target.

Accurate perceptual machinery makes processing really easy.

We communicate in flows of concepts.

computer can provide different ways to provide sensory stimuly to stimulate knowledge

Computers can give very different symbologies Interesting: Engelbart also sees that metaphors are outdated: we see knowledge as words on a page

Could conversations ever become an artefact?

Co-created with AI

Partially, yes. Already happening. Fully, not yet.

Already: transcripts, Slack threads, Stack Overflow Q&A, chat exports, prompt libraries. All treat past conversation as referenceable material. Engelbart’s NLS Journal did this in 1970: every conversation captured, structured, addressable, linkable. Conversation-as-artifact is not new.

Not yet: the structural vocabulary is missing. We have “document” parts (sentence, paragraph, section, citation) but not really “conversation” parts (turn, exchange, repair, branch, frame-shift, closing). Without that grammar, conversations can be stored but not composed with. You can’t refactor a conversation the way you refactor a document.

The shift if they ever fully become artifacts: conversations become editable, citable, refactorable, and combinable. That changes their epistemic role from trace of an interaction to unit of knowledge work. Different category. Probably valuable. Probably also strange (does editing a past conversation change what was said?).

This is one of the underbuilt design surfaces. Conversation-as-artifact is where conversation design might actually have a future: not as authoring chat replies, but as designing the structures that make conversations into composable material.

INteresting: Kane asked me whether writing is a primary skill for working with LLMs succesfully. He does all his prompting through talking/voice. This shows the difference I think…conversation is not an artefact, in that it’s not a unit of knowledge work.

Unit of knowledge work, I like that term.

The business of paradigms: paradigms are extremely important in how you can see possible futures. How do we set about developing future paradigms. HOw do we adapt what our picture of the future is

Engelbart is a designer at heart, isn’t he

Co-created with AI

Deeply. Not in the modern UX-designer sense, but in the framework-thinker / systems-designer sense, closer to Christopher Alexander or Herbert Simon. He spent his career naming (H-LAM/T, NIC, bootstrapping, ABC levels) and prototyping, which are the two designer’s habits. The mouse was the by-product; the framework was the work.

Worth noting: he trained as an electrical engineer at Berkeley and never carried the title “designer”. The field of design hadn’t yet claimed the conceptual-systems territory in 1962. He was doing it anyway.

What are the ABC levels?

Co-created with AI

Engelbart’s three levels of organisational activity:

A-activity: the primary work of the organisation. Making the product, serving the customer, delivering the service. The thing the org exists to do.

B-activity: improving how A is done. Better tools, methods, processes, training. Continuous improvement work.

C-activity: improving how B is done. Making the improvement process itself faster, smarter, more compounding. Improving the improver.

Most organisations only run A. Some run A + B (lean, kaizen, ops teams). Almost none run C, which is where compounding really happens. NIC and bootstrapping are C-level activities. Engelbart thought C was the actual leverage point for raising collective capability.

Scaling is a science. Dimensional scaling.

Knowledge base for one company can’t differ from a knowledge base for another company.

You still have to communicate and prove/proof things.

Is this solvable with LLMs now? I think it could be, but how?

Co-created with AI

Solvable in principle, not in practice yet. Split it in two:

Communication half (translation + synthesis). LLMs are already good here. A model can read company A’s knowledge base and answer questions in company B’s vocabulary without a shared schema. The interop problem that ate ontology research for thirty years is partly dissolved by statistical translation. Knowledge bases don’t have to differ in format anymore. The model bridges.

Proof half (verification + provenance). Largely unsolved. LLMs blend sources into output without lineage, and the confabulation problem means a fluent answer is not a verified answer. To “prove/proof” you need claim-level attribution, source authority, audit trails, and reproducibility. Current RAG + citation stacks gesture at this but don’t deliver it reliably. Engelbart’s NLS already had this in 1968: every claim was author-signed, dated, structurally linked to its provenance. That infrastructure is missing from the LLM stack today.

So the how: rebuild the proof infrastructure under the model. Provenance graphs, claim attribution, signed assertions, evals that test grounding, and shared verification standards across organisations. The LLMs handle the language; the provenance fabric handles the trust. Both are needed, and the second one is largely unbuilt.

We need a paradigm shift, even though there’s no business case for this. Because it’s critical We need to think of getting more effectively collectively really soon.

Different user interfaces should be able to plug in.

different interfaces doesn’t have to be easy to learn stuff. Progressoin from tricycles to bicycles. Nobody understands how you really ride a bicycle.

You steer a bicycle, you don’t know how to keep your wheels under you, you steer and trust your reflexes

So did he say anything about this?

Co-created with AI

Honestly: he supported the pluggable idea (the “different user interfaces should be able to plug in” note above), and NLS itself supported multiple input modes composed together: keyboard, mouse, chord keyset, view specifications that reshaped the same document. The augmentation system was always deeper than any one interface.

But the specific claim that “different metaphors should fit different kinds of work” is an extension past his text. He did not say that explicitly. His point is the technical/architectural one (the system underneath, not the interface on top). The metaphor-mapping is mine, not his.

From conversation to instrument: language as substrate

Implications for conversation as a metaphor

And if we look at language as the substrate, what are the implications for conversation as a metaphor?

Co-created with AI

Conversation is one use-case of language, not its essence. Speech-based, turn-taking, relational. When language becomes the substrate of computation, the surface gets much bigger than “two participants talking”. Most actual LLM work is closer to composing, transforming, querying, modelling, drafting, retrieving. Conversation is a thin user-facing skin on a richer set of operations.

The conversation metaphor imports relational expectations the medium doesn’t support. Politeness, persona, mind-reading, social bonding, the implication of a partner-with-stakes. None of these match what is actually happening in a transformer’s forward pass. The metaphor is therefore an anthropomorphism-by-design move: it makes the system feel like another human because the metaphor demands a counterpart.

Engelbart’s NLS users were not having conversations with the system. They were composing structured language using a deliberate grammar. The chord keyset and command metalanguage forced precision. If the metaphor had been “conversation”, NLS would have been ruined: imprecision and ambiguity, not augmentation.

The deep shift. The question stops being “are we talking?” and becomes “what are we making with this language?” Then conversation slots back into its proper, narrow place: one mode among many, suited to exploratory thinking and to sharing intent, badly suited to high-precision composition or to long-form material work.

Implication for the Guildford talk. “Conversation design” as a field name has been load-bearing for fifteen years. Once Language is substrate, that name describes one corner of the design surface. The talk could explicitly retire “conversation design” as the umbrella and propose something like language-substrate design or augmentation design, with conversation as one practice within it.

An alternative metaphor: instrument

And what would be an alternative metaphore?

Co-created with AI

Singular and primary: instrument.

The LLM is an instrument you play. Language is the material it shapes; the prompt is the score; the output is the performance. Mastery is real, visible, and acquired over years. The instrument has affordances and constraints you learn by use.

Why “instrument” beats “conversation”:

It places skill at the centre. Conversation implies any competent speaker can do it; instrument implies practice is the point.

It puts language back into a maker’s hands, not a relationship.

It scales solo to ensemble. Solo work, chamber playing (two humans + LLM), full orchestra (multi-agent + multi-human) all fit.

It maps cleanly to H-LAM/T:

Artifact = the instrument.

Language = the material the instrument shapes.

Methodology = the technique, the schools of playing.

Training = mastery. Years, not minutes.

It refuses anthropomorphism by design. You don’t ask a violin its opinion.

Why it travels well in conversation-design practice:

“Prompt design” becomes scoring (planning how the instrument is to be played for a given piece).

“Conversation design” becomes ensemble work (designing how multiple players, human and machine, coordinate).

“Content design” becomes composition (writing the score and arranging the performance).

“Voice design” still works directly: every instrument has a voice.

The broader workspace metaphor that contains it: studio. A studio has instruments, materials, methods, apprentices, and a maker at work. Solo studio or shared studio. Quiet practice and finished performance both happen there. Studio sits at the H-LAM/T scale; instrument sits at the artifact scale; the two fit together cleanly.

One-slide framing for Guildford: “From conversation to instrument. Stop talking to the model. Learn to play it.”

Was conversation a better metaphor pre-LLM?

Would you say that conversation was a better metaphor before LLMs were introduced? At least, then it was humans writing the bot replies. Argue both in favor and against.

Co-created with AI

Yes, much more honest pre-LLM. But still not fully honest. Both sides:

In favour: conversation was a better fit pre-LLM

There really was a writer behind every line. A conversation designer authored each reply with intent, voice, and stakes. The “other participant” wasn’t fiction, it was the design team, just async. Talking to the bot was talking to its writer through a script. A limited conversation, but a real one.

Turn-taking was constrained and intentional. A finite reply tree meant every move was a designed move. The metaphor matched the controllable surface: planable, rehearsable, shippable, repairable.

Personality was load-bearing and authored. Tone-of-voice docs, persona guides, naming workshops: all of these made sense because someone actually chose how the bot would speak. Not statistically averaging. Choosing.

Repair was design-able. Fallbacks, clarifications, escalations were deliberate next turns. You could conversation-design your way out of a dead end because the dead ends were known.

The discipline cohered. Intents, utterances, NLU training, dialog flows: a full methodology had grown around the metaphor and worked for the technology of the time.

Against: it was always a bad fit, even then

It was theatre, not conversation. A branching script with multiple-choice user input is closer to a multiple-choice exam than a dialogue. The user brought open-ended intent; the bot offered closed pre-baked moves.

The anthropomorphism harm started pre-LLM. Conversation primes the user to expect a thinking partner, then hands them a state machine. The Eliza effect was demonstrated in 1966 and ignored for fifty years.

It locked the field into a worse default UI. Forms, faceted search, command surfaces, direct manipulation: often more efficient than a scripted chat. Picking conversation-as-interface was an ideological move more than an empirical one.

The “writer behind the bot” was invisible to the user. Hidden authorship is bad for trust, bad for repair, bad for expectation management.

The methodology calcified just before the ground shifted. Intents-and-utterances thinking became conservative weight in the field, exactly when LLMs arrived to upend it.

The honest summary

The metaphor had a real human referent pre-LLM (the writer) and that made it more honest than it is now. But the referent was hidden from users, and the surface still mismatched the metaphor: a branching script is not a conversation. So the field has been operating with a partially-true metaphor for two decades, which is now fully untrue, and the muscle memory has not caught up.

For the Guildford talk: the conversation metaphor was tolerable when there was a human author, costly when there wasn’t, and is now actively misleading.

What I want to work on

how do we collectively get better? This is the essence of what I want to address. Need to move away from personal computing. I’d love to work on this.

Write your LinkedIn post here...