On December 20, 2024, OpenAI revealed a video infomercial on their newest “o3” release. OpenAI and even ChatGPT 4o, state it’s not the Holy Grail of AGI (Artificial General Intelligence), an AI system that can perform any intellectual task that a human can, but it did do something no other AI model has done previously. Whether this represents a giant leap or just an incremental step is unclear, as OpenAI has not disclosed the full capabilities of o3 beyond early safety testing. Sam Altman, the OpenAI CEO, stated that “o3 is an incredibly smart model.” Well, what does that actually mean? ChatGPT 4o blows me away every day.
WHAT o3 DID THAT NO OTHER AI MODEL HAS ACCOMPLISHED
François Chollet, co-founder and inventor at ARC Prize and previously a deep learning research software engineer at Google, was able to run benchmarks on OpenAI’s o3, published in their blog (https://lnkd.in/eT_ahQPC).
“OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.”
The test was the ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence). A collection of “challenges for intelligent systems,” a new benchmark. The ARC-AGI is billed as “the only benchmark specifically designed to measure adaptability to novelty.” That means that it is meant to test the acquisition of new skills, not just the use of memorized knowledge.
Other benchmarks include:
American Invitational Mathematics Exam (AIME) 2024: Achieved a 96.7% score, missing only one question.
GPQA Diamond: Scored 87.7% on graduate-level biology, physics, and chemistry questions.
In normal terms, the OpenAI o3 model outperformed humans in several of the benchmarks, even though they were very targeted types of benchmarks.
According to Chollet, “This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs.” He also states “o3’s improvement over the GPT series proves that architecture is everything. You couldn’t throw more compute at GPT-4 and get these results.”
AGI NIRVANA OR NOT
Well, I asked ChatGPT 4o if o3 had achieved AGI. The response was “Most experts, including OpenAI, suggest we are still far from true AGI, though progress is accelerating. o3 represents an advanced tool within narrow AI and a critical step forward, but it does not meet the criteria for AGI.” Chollet also stated “there were some simple problems humans could solve that o3 couldn’t.”
Currently, o3 and o3-mini are in the testing phase. OpenAI plans to release o3-mini by the end of January 2025, followed by the full o3 model TBA.
What do you think about o3’s potential to reshape the AI landscape? Comments and corrections welcome…