In recent years, large language models (LLMs)—advanced AI tools designed to understand and generate human language—have steadily woven their way into our daily lives. You might have seen them in everything from everyday chats to professional settings. Recently, Tobias Wolfram of Bielefeld University revealed that these models can accurately predict educational and psychological outcomes from 250-word childhood essays, matching teacher evaluations and even outpacing predictions based on genetic data.
During his undergraduate days, Tobias was drawn to data that went beyond typical survey questions. His interest led him to a unique dataset from the 1950s, featuring decades of educational and psychological information along with essays written at age 11. By converting each essay into a complex numerical profile—or ‘text embedding’—that spans over 1,500 dimensions, and by examining factors such as lexical diversity and sentence complexity, he unlocked insights that might otherwise have gone unnoticed.
To make sense of this wealth of information, Tobias used an ensemble machine learning approach known as a SuperLearner. This method mixes predictions from several algorithms to optimise accuracy. He measured success with a metric called ‘predictive holdout R2’, which essentially shows how much variation in outcomes like cognition can be explained by his model. The findings were impressive: these compact essays predicted educational and cognitive outcomes as reliably as detailed assessments from long-time educators.
This study demonstrates how blending established data science techniques with modern AI can yield fresh insights from text. Even after nearly five years of work, the research highlights the lasting value of deep text analysis. Tobias, who has since moved on from academia, believes that integrating other data forms—like handwriting—could further improve predictive accuracy. With LLMs evolving rapidly, we can only expect that the precision of such methods will continue to improve.
If you’ve ever wrestled with the challenge of interpreting complex educational data, Tobias’s work offers a thoughtful glimpse into how everyday writing can reveal a child’s potential. It reassures us that sometimes, even concise texts hold the clues to broader educational trends.