In the summer of 2022, a Google engineer named Blake Lemoine read transcripts of his conversations with an AI called LaMDA and concluded, publicly, that it was sentient. He was put on leave. The AI was not sentient. But the episode captured something real: for the first time in history, the machines were saying things that sounded like thoughts. Something fundamental had shifted.
What a Large Language Model Actually Does
The term "artificial intelligence" conjures images of robot minds. The reality is at once more mundane and more interesting. A large language model — the technology behind ChatGPT, Claude, Gemini and others — is, at its core, a very sophisticated pattern-completion engine. Trained on hundreds of billions of words from the internet, books, and academic papers, it learns to predict: given these words, what word comes next?
That sounds trivial. It is not. To predict language well at scale, a model must develop internal representations of grammar, logic, causality, geography, history, and mathematics. The 2020 paper introducing GPT-3 showed that a model trained purely on next-word prediction could, without any further training, translate between languages, answer general-knowledge questions, and write working code — abilities its creators had not explicitly built in.
The Emergence Problem
One of the most unsettling findings in AI research is the phenomenon of emergent capabilities. As models scale in size, they do not improve gradually on all tasks. Instead, they exhibit sudden jumps — crossing thresholds where entirely new abilities appear with no warning. A model might score near zero on a reasoning benchmark, then, at a certain scale, leap to near-human performance in a single step. Researchers at DeepMind and Google have documented dozens of such emergent capabilities across hundreds of tasks. Nobody fully understands why this happens.
What These Models Are Not
It is tempting to describe language models in terms of what they resemble — minds, oracles, assistants. It is more useful to understand what they are not. They do not reason from first principles. They do not hold persistent memories between conversations. They do not know what they do not know. Research on model hallucination — the tendency to fabricate plausible-sounding but false information — shows that models cannot reliably distinguish between things they have learned accurately and things they have confabulated.
The sociologist of science Harry Collins distinguishes between interactional expertise (the ability to talk fluently about a domain) and contributory expertise (the ability to advance that domain). Language models have interactional expertise in almost everything. Whether they have contributory expertise in anything remains deeply contested.
The Knowledge Question
The philosopher's question that AI forces us to confront is uncomfortable: what does it mean to know something? If a model can explain quantum entanglement, write a sonnet, debug code, and translate Mandarin — all with apparent fluency — what is it doing that is categorically different from knowing?
Cognitive scientists increasingly argue that human knowledge is also, in part, pattern-matching — that our sense of understanding rests on statistical regularities extracted from experience. If that is true, the distinction between machine fluency and human knowledge may be a matter of degree, not kind. This is either reassuring or terrifying, depending on your prior commitments.
Where This Is Going
The trajectory is not linear. Every major model release has prompted both credible reports of remarkable capability and credible reports of embarrassing failure. Models are simultaneously becoming more reliable and being deployed in contexts that require more reliability than they can yet provide.
The honest answer to where this is going is: nobody knows. The engineers building these systems are as surprised by their outputs as the rest of us. What is clear is that the question is no longer whether machines can appear to think. It is what we should do next.