LLM Output Evaluation

This cluster discusses challenges in verifying and evaluating the correctness of Large Language Model (LLM) outputs, including tools, frameworks like Giskard and DeepEval, and methods such as LLM-as-judge.

🚀 Rising 2.4x AI & Machine Learning
2,821
Comments
16
Years Active
5
Top Authors
#8792
Topic ID

Activity Over Time

2011
1
2012
1
2013
1
2014
1
2015
4
2016
1
2017
7
2018
4
2019
5
2020
11
2021
13
2022
45
2023
559
2024
702
2025
1,314
2026
154

Keywords

IMHO LLM K2 openai.com LangChain UI AI HN www.gov AISI llm llms output evaluation evaluate judge verify o1 criteria testing

Sample Comments

jipster Aug 25, 2023 View on HN

Hey HN! We've built this platform that allows you to evaluate how well your LLM implementation is performing, despite that be using open source tools such as LangChain, lLamaIndex, or even your own internal framework.The idea is you would use our open source package (https://github.com/confident-ai/deepeval) to evaluate LLM outputs using criteria such as factual consistency, relevancy, bias, et

How do you verify your LLMs output?

akomtu Jun 2, 2025 View on HN

LLMs can't evaluate their own output. LLMs suggest possibilities, but can't evaluate them. Imagine an insane man who is rumbling something smart, but doesn't self-reflect. The evaluation is done against some framework of values that are considered true: the rules of a board game, the language syntax or something else. LLMs also can't fabricate evaluation because the latter is a rather rigid and precise model, a unlike natural language. Otherwise you could set up two LLMs ques

cudgy Jan 18, 2026 View on HN

How do you verify that the output is correct for these areas that the LLM dwarfs your knowledge?

chrz Aug 15, 2025 View on HN

and each LLM can invent some ridiculous suprise. Who is going to check if it did right thing?

kjkjadksj May 30, 2024 View on HN

Curious how you can even tell the llm result is correct when you are apparently unable to validate it with other methods

firtoz Oct 5, 2023 View on HN

Evals do help to account for correctness when it comes to LLMs

n_plus_1_acc Feb 1, 2024 View on HN

You evaluate whether they ceitically review the LLM answer or just take it at truth.

SoftTalker May 17, 2023 View on HN

Yes, the LLM will give you an answer. Are you verifying that what the LLM tells you is correct? How would you even know?

alexcombessie Jun 16, 2023 View on HN

Thanks! LLM testing is a specific challenge, we’re interested in your feedback on our alpha version.Here’s a notebook to try it: https://docs.giskard.ai/en/latest/reference/notebooks/llm_co...