LLM Output Evaluation

This cluster discusses challenges in verifying and evaluating the correctness of Large Language Model (LLM) outputs, including tools, frameworks like Giskard and DeepEval, and methods such as LLM-as-judge.

🚀 Rising 2.4x AI & Machine Learning

2,821

Comments

Years Active

Top Authors

#8792

Topic ID

Activity Over Time

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

559

2024

702

2025

1,314

2026

154

Top Contributors

visarga (23) simonw (15) CuriouslyC (11) intended (10) layer8 (9)

Keywords

IMHO LLM K2 openai.com LangChain UI AI HN www.gov AISI llm llms output evaluation evaluate judge verify o1 criteria testing

Sample Comments

jipster • Aug 25, 2023 • View on HN

Hey HN! We've built this platform that allows you to evaluate how well your LLM implementation is performing, despite that be using open source tools such as LangChain, lLamaIndex, or even your own internal framework.The idea is you would use our open source package (https://github.com/confident-ai/deepeval) to evaluate LLM outputs using criteria such as factual consistency, relevancy, bias, et

participant1138 • Sep 8, 2025 • View on HN

How do you verify your LLMs output?

akomtu • Jun 2, 2025 • View on HN

LLMs can't evaluate their own output. LLMs suggest possibilities, but can't evaluate them. Imagine an insane man who is rumbling something smart, but doesn't self-reflect. The evaluation is done against some framework of values that are considered true: the rules of a board game, the language syntax or something else. LLMs also can't fabricate evaluation because the latter is a rather rigid and precise model, a unlike natural language. Otherwise you could set up two LLMs ques

cudgy • Jan 18, 2026 • View on HN

How do you verify that the output is correct for these areas that the LLM dwarfs your knowledge?

chrz • Aug 15, 2025 • View on HN

and each LLM can invent some ridiculous suprise. Who is going to check if it did right thing?

kjkjadksj • May 30, 2024 • View on HN

Curious how you can even tell the llm result is correct when you are apparently unable to validate it with other methods

firtoz • Oct 5, 2023 • View on HN

Evals do help to account for correctness when it comes to LLMs

n_plus_1_acc • Feb 1, 2024 • View on HN

You evaluate whether they ceitically review the LLM answer or just take it at truth.

SoftTalker • May 17, 2023 • View on HN

Yes, the LLM will give you an answer. Are you verifying that what the LLM tells you is correct? How would you even know?

alexcombessie • Jun 16, 2023 • View on HN

Thanks! LLM testing is a specific challenge, we’re interested in your feedback on our alpha version.Here’s a notebook to try it: https://docs.giskard.ai/en/latest/reference/notebooks/llm_co...