
"Some lawyers have learnt that the hard way, and have been fined for filing AI-generated court briefs that misrepresented principles of law and cited non-existent cases. The same is true in other fields. For example, AI models can pass the gold-standard test in finance - the Chartered Financial Analyst exam - yet score poorly on simple tasks required of entry-level financial analysts (see go.nature.com/42tbrgb)."
"Whenever assessments measure the intended skill inaccurately, it is considered a proxy failure. For example, a lawyer who scored A+ on an exam would be expected to avoid the kinds of error that an AI tool with a similar score might make in a real-world scenario. Better tests are urgently required to help guide the use of AI in complex, high-stakes situations."
"Imagine an AI model attempting to 'pass' an interview with an acclaimed legal scholar such as Cass Sunstein at Harvard University in Cambridge, Massachusetts. Sunstein's expert probing would be a better measure of the model's legal knowledge than a standardized test or automatically scored benchmark. Passing the 'Sunstein test' would require an AI tool to display true legal mastery, being able to wade through ambiguity and contradiction, and not just answer multiple-choice questions or write an essay."
AI models can match human performance on multiple-choice, short-answer and essay law exams but fail at real-world legal tasks. Lawyers have been fined for submitting AI-generated briefs that misrepresented law and cited non-existent cases. Similar gaps appear in finance, where models pass the Chartered Financial Analyst exam yet struggle with entry-level analyst tasks. Such mismatches are proxy failures caused by assessments that do not measure intended practical skills. Better tests are urgently required for high-stakes uses of AI. Extensive, interactive evaluation by specialists can reveal whether an AI genuinely understands a domain rather than merely imitating understanding.
 Read at Nature
Unable to calculate read time
 Collection 
[
|
 ... 
]