A New Paper Tested AI's Ability to Do Actual Online Freelance Work, and the Results Are Damning
Briefly

A New Paper Tested AI's Ability to Do Actual Online Freelance Work, and the Results Are Damning
"New research highlighted by Wired shows how these AI models designed to automate tasks - if not entire jobs - turn out to be incredibly unproductive compared to the humans they're replacing. Conducted by researchers at the nonprofit Center for AI Safety (CAIS) and the massive data annotation firm Scale AI, whose army of freelancers performs much of the grunt work underpinning the AI industry, the tests involved giving six leading AI agents various simulated freelance tasks."
"The outcome of those tests, detailed in a new paper, was damning. Not a single AI agent was able to perform more than 3 percent of the work, making just $1,810 out of a possible $143,991. For the tests, the researchers developed their own benchmark called the Remote Labor Index, which uses a wide range of real-world remote projects to evaluate the bots' ability to perform economically valuable work in industries ranging from game development to data analysis."
"The top performer, they found, was an AI agent from the Chinese startup Manus with an automation rate of just 2.5 percent, meaning it was only able to complete 2.5 percent of the projects it was assigned at a level that would be acceptable as commissioned work in a real-world freelancing job, the researchers said."
The Center for AI Safety and Scale AI evaluated six leading AI agents on simulated freelance work using a new Remote Labor Index benchmark. The benchmark covered a wide range of real-world remote projects across industries including game development and data analysis. No AI agent completed more than 3 percent of assigned projects to an acceptable freelance standard, yielding only $1,810 of $143,991 total possible earnings. The best-performing agent from Manus automated just 2.5 percent of work, while others ranged from 2.1 to 1.7 percent, indicating very limited practical automation capability.
Read at Futurism
Unable to calculate read time
[
|
]