SoftwareAIA new math benchmark just dropped and leading AI models can solve ‘less than 2%’ of its problems… oh dearWhen you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.
SoftwareAIA new math benchmark just dropped and leading AI models can solve ‘less than 2%’ of its problems… oh dearWhen you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.
When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.
(Image credit: PhonlamaiPhoto)

Sometimes I forget there’s a whole other world out there where AI models aren’t just used for basic tasks such as simple research and quick content summaries. Out in the land of bigwigs, they’re instead being used to help with everything from financial analysis to scientific research. That’s why their mathematical capabilities are so important—plus it’s a general marker of reasoning capabilities.
Which is why mathematical benchmarks exist. Benchmarks such asFrontierMath, which its maker, Epoch AI, has just dropped and which is putting LLMs through their paces with “hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems” (viaArs Technica).
While today’s AI models don’t tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according to Epoch AI, “they solve less than 2% of FrontierMath problems, revealing a substantial gap between current AI capabilities and the collective prowess of the mathematics community”.
To be clear, these arehardproblems. As in, so hard that they “typically require hours or days for expert mathematicians to solve”, ranging “from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory”.
What’s so different about this benchmark is that solving these mathematical problems requires “extended chains of precise reasoning, with each step building exactly on what came before”.
AI models have traditionally not been great at extended reasoning in general, let alone for super-advanced math. This makes sense when you consider what AI models, at bottom, are doing. Using LLMs as an example, these are trained on tons of data to figure out what each next word would most likely be based on this data. Although of course there’s plenty of room for directing the model more towards different words, the process is essentially probabilistic.
Of late, however, we’ve seen AI models apply their probabilistic “thinking” in more of a directed fashion towards intermediary steps of this “thinking”. In other words, we’ve seen a move towards AI models that attempt toreason throughtheir thinking, rather than just jumping to a probabilistic conclusion.
The biggest gaming news, reviews and hardware deals
There’s now a version of ChatGPT-4o, for instance, that uses reasoning (and you better make sure youdon’t question it). It’s also telling that you can now potentially be awarded for giving a question that AI can’t answer for “humanity’s last exam”.
Of course, these individual steps of reasoning might themselves be arrived at probabilistically—and could we expect any more from a non-sentient algorithm?—but they do seem to be engaging in what we flesh-and-bloodies after the fact consider to be “reasoning”.
We’re clearly a way off from having these AI models achieve the reasoning capabilities of our best and brightest, though. We can see that now that we have a mathematical benchmark capable of really putting them to the test—2% isn’t great, is it? (And take that, robots.)
AI, explained(Image credit: Jakub Porzycki/NurPhoto via Getty Images)What is artificial general intelligence?:We dive into the lingo of AI and what the terms actually mean.
AI, explained
(Image credit: Jakub Porzycki/NurPhoto via Getty Images)What is artificial general intelligence?:We dive into the lingo of AI and what the terms actually mean.
(Image credit: Jakub Porzycki/NurPhoto via Getty Images)

What is artificial general intelligence?:We dive into the lingo of AI and what the terms actually mean.
While AI models might not be able to crack these difficult problems just yet, the FrontierMath benchmark looks to serve as a good litmus test for future improvements, ensuring the models aren’t just spewing out mathematical nonsense that only experts could verify as such.
We must, in the end, remember that AI is not truth-aiming, however closelywe humansaim its probabilistic reasoning at results that tend towards the truth. The philosopher in me must ask: Without it having an inner life aiming towards truth, can truth actually exist for the AI, even if it spews it out? Truth for us, yes, but for the AI? I suspect not, and that’s why benchmarks like these will be crucial moving forwards into thisnew industrial revolution, or whatever they’re calling it these days.
TOPICSHardware
TOPICS
More about aiNvidia’s impressive AI-based computer tuneup tool G-Assist launches next month but the best bit is missingLogitech has announced an ‘intelligent streaming assistant’ in Streamlabs to tell you when your live stream sucksLatestCruel is a frantic run-and-gun shooter where you boot cultists out of windows in a cursed apartment block that wants you deadSee more latest►
More about aiNvidia’s impressive AI-based computer tuneup tool G-Assist launches next month but the best bit is missingLogitech has announced an ‘intelligent streaming assistant’ in Streamlabs to tell you when your live stream sucksLatestCruel is a frantic run-and-gun shooter where you boot cultists out of windows in a cursed apartment block that wants you deadSee more latest►
More about aiNvidia’s impressive AI-based computer tuneup tool G-Assist launches next month but the best bit is missingLogitech has announced an ‘intelligent streaming assistant’ in Streamlabs to tell you when your live stream sucks
More about ai
Nvidia’s impressive AI-based computer tuneup tool G-Assist launches next month but the best bit is missingLogitech has announced an ‘intelligent streaming assistant’ in Streamlabs to tell you when your live stream sucks
Nvidia’s impressive AI-based computer tuneup tool G-Assist launches next month but the best bit is missing
Nvidia’s impressive AI-based computer tuneup tool G-Assist launches next month but the best bit is missing
Logitech has announced an ‘intelligent streaming assistant’ in Streamlabs to tell you when your live stream sucks
Logitech has announced an ‘intelligent streaming assistant’ in Streamlabs to tell you when your live stream sucks
LatestCruel is a frantic run-and-gun shooter where you boot cultists out of windows in a cursed apartment block that wants you deadSee more latest►
Latest
Cruel is a frantic run-and-gun shooter where you boot cultists out of windows in a cursed apartment block that wants you dead
Cruel is a frantic run-and-gun shooter where you boot cultists out of windows in a cursed apartment block that wants you dead
Cruel is a frantic run-and-gun shooter where you boot cultists out of windows in a cursed apartment block that wants you dead
See more latest►
Most Popular
My Summer Car, the absurdly detailed Finnish life sim about vehicle maintenance and drinking in your underpants, smashes into 1.0 after nearly a decade in Steam early access
Assassin’s Creed Shadows takes a run at improving parkour, as Ubisoft strives to make the system less ‘like a gas pedal’
Today’s Wordle answer for Saturday, January 11
The Last of Us season 2 trailer is a brief look at a story you should probably just play yourself in April
Lords of the Fallen publisher embraces fear of the DEI boogeyman, says it will not include ‘any social or political agendas’ in its games
Epic CEO Tim Sweeney says tech leaders are ‘pretending to be Republicans’ to gain favor with Trump, skirt antitrust laws, and ultimately ‘rip off consumers and crush competitors’
If you’re trying to convince me your ‘companionship’ robot is ‘lifelike’, maybe don’t rip her face off in the demo video
Square Enix launches new anti-harassment policy to protect its employees and partners from abusive fans
The Sims begins its 25th anniversary celebration next week with a Behind The Sims episode of news and announcements
Path of Exile 2 numberlord spends 16 straight days killing rare monsters to prove that a stat that makes loot better makes better loot
HARDWARE BUYING GUIDESLATEST GAME REVIEWS1Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads2Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop3Best mini PCs in 2025: The compact computers I love the most4Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands5Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested1Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits2Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback3WD Black SN850X 8 TB NVMe SSD review4Ikea Utespelare desk review5Asus ROG Harpe Ace Mini wireless mouse review
HARDWARE BUYING GUIDESLATEST GAME REVIEWS1Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads2Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop3Best mini PCs in 2025: The compact computers I love the most4Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands5Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested1Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits2Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback3WD Black SN850X 8 TB NVMe SSD review4Ikea Utespelare desk review5Asus ROG Harpe Ace Mini wireless mouse review
HARDWARE BUYING GUIDESLATEST GAME REVIEWS1Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads2Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop3Best mini PCs in 2025: The compact computers I love the most4Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands5Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested1Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits2Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback3WD Black SN850X 8 TB NVMe SSD review4Ikea Utespelare desk review5Asus ROG Harpe Ace Mini wireless mouse review
HARDWARE BUYING GUIDESLATEST GAME REVIEWS1Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads2Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop3Best mini PCs in 2025: The compact computers I love the most4Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands5Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested1Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits2Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback3WD Black SN850X 8 TB NVMe SSD review4Ikea Utespelare desk review5Asus ROG Harpe Ace Mini wireless mouse review
HARDWARE BUYING GUIDESLATEST GAME REVIEWS
1Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads
1Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads
1
Best Steam Deck accessories in Australia for 2025: Our favorite docks, powerbanks and gamepads
2Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop
2Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop
2
Best graphics card for laptops: the mobile GPUs I’d want in my next gaming laptop
3Best mini PCs in 2025: The compact computers I love the most
3Best mini PCs in 2025: The compact computers I love the most
3
Best mini PCs in 2025: The compact computers I love the most
4Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands
4Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands
4
Best 14-inch gaming laptop: The top compact gaming laptops I’ve held in these hands
5Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested
5Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested
5
Best Mini-ITX motherboards in 2025: My pick from all the mini mobo marvels I’ve tested
1Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits
1Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits
1
Thank Goodness You’re Here! review: An anarchic treasure trove of jokes and skits
2Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback
2Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback
2
Shiren the Wanderer: The Mystery Dungeon of Serpentcoil Island review—like juggling chainsaws on horseback
3WD Black SN850X 8 TB NVMe SSD review
3WD Black SN850X 8 TB NVMe SSD review
3
WD Black SN850X 8 TB NVMe SSD review
4Ikea Utespelare desk review
4Ikea Utespelare desk review
4
Ikea Utespelare desk review
5Asus ROG Harpe Ace Mini wireless mouse review
5Asus ROG Harpe Ace Mini wireless mouse review
5
Asus ROG Harpe Ace Mini wireless mouse review