Eduardo Rocha de Andrade won the K Prize, an AI coding challenge, with a score of 7.5%. Laude Institute and Andy Konwinski organized the contest to create a difficult benchmark for AI code models. Konwinski has offered $1 million to any open-source model that scores above 90% on the test. The K Prize utilizes real-time, contamination-free issues from GitHub, contrasting sharply with the scores of SWE-Bench, a different benchmark system, suggesting that the K Prize is more challenging. Future rounds are expected to provide further insights into AI coding performance.
"We're glad we built a benchmark that is actually hard. Benchmarks should be hard if they're going to matter."
"As we get more runs of the thing, we'll have a better sense, because we expect people to adapt to the dynamics of competing on this every few months."
Collection
[
|
...
]