A new AI coding challenge just published its first results - and they aren't pretty

""We're glad we built a benchmark that is actually hard. Benchmarks should be hard if they're going to matter.""

""As we get more runs of the thing, we'll have a better sense, because we expect people to adapt to the dynamics of competing on this every few months.""

Eduardo Rocha de Andrade won the K Prize, an AI coding challenge, with a score of 7.5%. Laude Institute and Andy Konwinski organized the contest to create a difficult benchmark for AI code models. Konwinski has offered $1 million to any open-source model that scores above 90% on the test. The K Prize utilizes real-time, contamination-free issues from GitHub, contrasting sharply with the scores of SWE-Bench, a different benchmark system, suggesting that the K Prize is more challenging. Future rounds are expected to provide further insights into AI coding performance.

#ai-coding-challenge #k-prize #benchmark-testing #coding-competition #ai-performance

Read at TechCrunch

Unable to calculate read time

Collection

[

...

]

A new AI coding challenge just published its first results - and they aren't pretty | TechCrunchA new AI coding challenge just published its first results - and they aren't pretty | TechCrunch Briefly

A new AI coding challenge just published its first results - and they aren't pretty | TechCrunch
A new AI coding challenge just published its first results - and they aren't pretty | TechCrunch
Briefly