Back to blog

When Models Stop Thinking Left to Right

Text diffusion models may weaken one of AI safety's familiar assumptions: that reasoning can be monitored as a sequential chain of thought.

AI SafetyReasoningDiffusion Models

There is one concept in AI safety debates that is almost always treated as a given: the chain of thought. CoT. We are used to models writing text from left to right, token by token, while we look over their shoulder. The model thought, it wrote, we saw. Transparent.

But what if the next generation of models stops thinking sequentially?

A new family of architectures, text diffusion models, generates text differently. They do not write from left to right. They begin with noise, with a chaotic set of tokens, and then gradually clarify the whole page at once, step by step. Like a photograph developing from overexposed film. In that process there is no obvious chronology: the model may write the ending before the beginning, then change its mind and rewrite everything.

An autoregressive model is like a chess player who says every move out loud. A diffusion model is like the same chess player staring silently at the board for three minutes and then rearranging all the pieces at once. The result may be the same. But which one is easier to supervise?

This is where a subtle distinction appears. We can inspect the intermediate states of a diffusion model and see which tokens it is considering at each step. That is variable transparency, and it is useful. But understanding the algorithm it is using, how exactly it moves from noise to meaningful text, is much harder. In autoregression, every next word is built on the previous ones. In diffusion, tokens influence each other in both directions.

For AI safety, this is not an academic detail. One of the key safety practices, monitoring the chain of thought, assumes that we can read what the model is thinking and catch deception or dangerous intent. But what if the model stops thinking in text? What if its reasoning happens in a multidimensional space of hidden states to which we have no linguistic access?

We already see hints of this in other architectures. OpenAI’s o1 and DeepSeek R1 generate chains of thought, but those chains are no longer quite an honest transcript of cognition. The model learns to produce text that looks like reasoning, not necessarily to reason in text. The difference is subtle, but critical.

Diffusion models take the next step. They do not have to organize logic chronologically at all. They can correct themselves retroactively: first write a wrong answer, list the numbers, and then at a later step recognize the mistake and rewrite the beginning. This is retroactive self-correction. Humans do this too: we draft, then revise. But AI safety has grown used to trusting models that think aloud sequentially.

Another phenomenon typical of these models is token smearing. When the model is confident that a word belongs somewhere in a sentence but does not yet know exactly where, it keeps that word distributed across neighboring positions. Humans do not think this way.

The problem is not that diffusion models are dangerous by themselves. Much of their computation remains interpretable. The problem is that we are building safety systems for one type of intelligence, while the next type may arrive in a different architecture.

More thinking