Researchers have found that the internal workings of Transformer models bear a surprising resemblance to the way humans process information. A study by experts from Harvard, Brown, and the University of Tübingen reveals that these systems—used in advanced applications like GPT and Vision Transformers—don’t just mimic human outputs; they also seem to follow similar processing strategies.
The team looked in detail at how predictions evolve during a forward pass, where input data moves layer by layer until a final prediction is reached. They compared these changing predictions to human behaviours such as accuracy, reaction times, typing rhythms, and mouse movements. By measuring factors like uncertainty, confidence, relative confidence, and the reinforcement of correct choices over intuitive slips, they built a bridge between machine computations and human cognition.
The research covered a range of tasks. In a fact recall exercise, for instance, the models initially leaned toward an intuitive (yet sometimes wrong) answer before later layers solidified the correct one. In a fact recognition task, metrics that compared relative confidence proved especially useful in predicting human accuracy and reaction speeds. Similarly, experiments involving mouse tracking and syllogistic reasoning—where personal beliefs can override strict logic—demonstrated that these process metrics offer deeper insights into human responses.
The study even extended into the realm of vision. When evaluating Vision Transformers on challenging object recognition tasks, the same process metrics, particularly uncertainty, helped forecast human performance. This suggests that when models find an input difficult, humans do too, leading to slower response times or mistakes under similar conditions.
Overall, this research suggests that large AI models might do more than simply map inputs to outputs—they could also serve as valuable models for understanding human thought. While the study focused on specific tasks and pre-trained models, it opens up exciting possibilities for applying these insights to other architectures and individual processing patterns.