AI Agents Are Terrible Freelance Workers

"Even the best artificial intelligence agents are fairly hopeless at online freelance work, according to an experiment that challenges the idea of AI replacing office workers en masse. The Remote Labor Index, a new benchmark developed by researchers at data annotation company Scale AI and the Center for AI Safety (CAIS), a nonprofit, measures the ability of frontier AI models to automate economically valuable work."

"The researchers gave several leading AI agents a range of simulated freelance work and found that even the best could perform less than 3 percent of the work, earning $1,810 out of a possible $143,991. The researchers looked at several tools and found the most capable to be Manus from a Chinese startup of the same name, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google."

""I should hope this gives much more accurate impressions as to what's going on with AI capabilities," says Dan Hendrycks, director of CAIS. He adds that while some agents have improved significantly over the past year or so, that does not mean that this will continue at the same rate. Spectacular AI advances have led to speculation about AI soon surpassing human intelligence and replacing vast numbers of workers."

A new benchmark called the Remote Labor Index measures the ability of frontier AI models to automate economically valuable work. Leading AI agents completed under 3% of simulated Upwork freelance tasks, earning $1,810 of a possible $143,991. The most capable models ranked Manus, Grok, Claude, ChatGPT, and Gemini. Evaluated tasks included graphic design, video editing, game development, and administrative chores, each accompanied by job descriptions, required files, and human-produced examples. Improvements in coding and math performance have not translated to broad automation of diverse real-world freelance tasks. Past AI waves produced misplaced predictions about rapid job displacement.

#ai-capabilities #automation #freelance-work #benchmarking

Read at WIRED

Unable to calculate read time

Collection

[

...

]

AI Agents Are Terrible Freelance WorkersAI Agents Are Terrible Freelance Workers Briefly

AI Agents Are Terrible Freelance Workers
AI Agents Are Terrible Freelance Workers
Briefly