Fixing Hallucinations Would Destroy ChatGPT, Expert Finds
Evaluation incentives push AI to guess confidently instead of expressing uncertainty, improving test performance but increasing harmful hallucinations and undermining safety and user trust.
Why AI chatbots hallucinate, according to OpenAI researchers
Large language models hallucinate because training and evaluation reward guessing over admitting uncertainty; redesigning evaluation metrics can reduce hallucinations.
Experiment Design and Metrics for Mutation Testing with LLMs | HackerNoon
In evaluating LLM-generated mutations, we designed metrics that encompass cost, usability, and behavior, recognizing that higher mutation scores don't guarantee higher quality.