#evaluation-metrics

[ follow ]
fromHackernoon
1 year ago

Experiment Design and Metrics for Mutation Testing with LLMs | HackerNoon

In evaluating LLM-generated mutations, we designed metrics that encompass cost, usability, and behavior, recognizing that higher mutation scores don't guarantee higher quality.
Scala
Artificial intelligence
fromMedium
1 month ago

The problems with running human evals

Running evaluations is essential for building valuable, safe, and user-aligned AI products.
Human evaluations help capture nuances that automated tests often miss.
Artificial intelligence
fromHackernoon
6 months ago

Evaluating TnT-LLM Text Classification: Human Agreement and Scalable LLM Metrics | HackerNoon

Reliability in text classification is crucial and can be assessed using multiple annotators and LLMs to align with human consensus.
[ Load more ]