At the recent Nvidia GTC conference in San Jose, Noam Brown, OpenAI’s research lead, shared an intriguing thought: we could have developed AI ‘reasoning’ models like OpenAI’s o1 two decades ago. The catch? Researchers needed to pinpoint the right methods and algorithms back then. Brown said, “There were various reasons why this research direction was neglected.” He also highlighted the importance of integrating human-like deliberation into AI, saying, “Humans spend a lot of time thinking before they act in a tough situation. Maybe this would be very useful in AI.”
Noam Brown, a key mind behind the o1 model, discussed how test-time inference allows AI to ‘think’ before answering questions. This approach boosts accuracy and reliability, especially in fields like math and science. Even with this innovation, Brown made it clear that pre-training—where models are trained on massive datasets—still plays a crucial role and works hand in hand with test-time inference.
Addressing the gap between academia and leading AI labs like OpenAI in terms of resources, Brown acknowledged the challenge but encouraged academic institutions to focus on areas like model architecture, which demand less computational power. He saw potential for collaboration, noting, “Certainly, the frontier labs are looking at academic publications and thinking carefully about, OK, does this make a compelling argument.”
Brown’s comments come at a time when scientific funding has been significantly reduced under the Trump administration. Experts like Nobel Laureate Geoffrey Hinton have criticized these cuts for potentially harming AI research. Brown pointed out that AI benchmarking is an area where academia can make real strides, saying, “The state of benchmarks in AI is really bad, and that doesn’t require a lot of compute to do.”
Right now, the benchmarks often test for niche knowledge, causing confusion about what models can actually do and how they’re improving. Tackling these issues could clear up misunderstandings and push AI research forward.