- Top AI model’s accuracy in the world’s hardest benchmark improves by 183% in just two weeks
- ChatGPT o3-mini achieves up to 13% accuracy depending on capacity
- OpenAI Deep Research sets new high with 26.6% accuracy on Humanity’s Last Exam
The world’s most challenging AI test, Humanity’s Last Exam, was launched less than two weeks ago. Despite its short lifespawn, we’ve already witnessed significant progress in AI performance. Innovative models such as ChatGPT o3-mini and OpenAI’s Deep Research have made notable strides, with the latter emerging as the frontrunner.
Deep Research Leads with Impressive Accuracy
OpenAI Deep Research has broken new ground, scoring 26.6% accuracy on the benchmark. This represents an astounding 183% improvement in just under a week. While the score is relatively modest by human standards, it showcases rapid advancements in AI technology.
A crucial factor in Deep Research’s success is its ability to access the internet for research, which gives it an edge over other models that rely solely on pre-existing knowledge. This feature is particularly valuable for exams like Humanity’s Last Exam, which include a variety of general knowledge questions.
ChatGPT o3-Mini’s Performance
ChatGPT o3-mini, another notable AI model, achieves variable accuracy depending on its capacity, ranging up to 13%. This demonstrates the model’s versatility and adaptability, although it still lags behind Deep Research in terms of performance on this specific benchmark.
AI Advancements on Humanity’s Last Exam
The steady improvement in AI accuracy on Humanity’s Last Exam is a testament to the rapid pace of technological progress. Although the exam remains challenging for AI systems, the gap between human performance and AI capabilities is narrowing.
While achieving human-level performance on this exam remains a distant goal, the current progress is encouraging. The continued development of AI models like Deep Research could lead to breakthroughs that redefine our understanding of artificial intelligence.
It looks like the latest OpenAI model is doing well across many topics. My guess is that Deep Research particularly helps with subjects including medicine, classics, and law. pic.twitter.com/x8Ilmq1aQS
The Significance of Deep Research
OpenAI Deep Research is a powerful AI agent designed to perform complex analyses and generate comprehensive reports with minimal human intervention. Its capabilities highlight the potential of advanced AI systems to revolutionize various industries, from healthcare to legal services.
Despite its impressive performance, Deep Research’s score of 26.6% on Humanity’s Last Exam underscores the ongoing challenges in AI research. While this score represents a significant milestone, it also highlights the need for further development to achieve human-like proficiency on such complex benchmarks.
The Future of AI
Humanity’s Last Exam serves as a crucial benchmark in evaluating AI progress. As researchers continue to refine and enhance AI models, we can expect to see further improvements in accuracy and functionality across various domains.
The question remains: How long will it take for an AI model to surpass the 50% mark on Humanity’s Last Exam? While this milestone may still be distant, the current advancements suggest that we are moving closer to achieving human-like AI performance in the near future.
You may also like
Your thoughts on these AI advancements? Join the conversation below and share your insights!
If you enjoyed this article, don’t forget to subscribe to our newsletter for the latest updates in AI technology and innovation. Be sure to share this on your favorite social media platforms to keep the discussion going!
