Maybe AI agents can be lawyers after all
In Brief Last month, I wrote about Mercor’s new benchmark measuring AI agents’ capabilities on professional tasks like law and corporate analysis.
But AI capabilities can change a lot in a couple of weeks.
This week’s release of Anthropic’s Opus 4.
6 shook up the leaderboards, with Anthropic’s new model scoring just shy of 30% in one-shot trials, and an average of 45% when given a few more cracks at the problem. Notably, the release included a bunch of new agentic features, including “agent swarms,” which may have helped with this kind of multistep problem-solving.
Mercor CEO Brendan Foody, who was particularly impressed, said, “jumping from 18. 8% in a few months is insane. ” The APEX-Agents Leaderboard. Image Credits:Mercor (screenshot) Thirty percent is still a long way from 100%, so it’s not like lawyers need to be worried about getting replaced by machines next week.
But they should be a lot less confident than they were last month!
Logic Quality Breakdown:
- Updated_At:
- Truth_Blocks:
- Analysis_Method: