AI Model Evaluations

The cluster focuses on discussions about evaluating, comparing, and using various AI/ML models, including debates on their quality, accuracy, usability for specific tasks, and the need for transparency or improvements.

➡️ Stable 1.2x AI & Machine Learning

5,076

Comments

Years Active

Top Authors

#2579

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

143

2017

200

2018

218

2019

238

2020

310

2021

321

2022

349

2023

798

2024

712

2025

1,307

2026

Top Contributors

simonw (18) visarga (15) amelius (15) esafak (13) disgruntledphd2 (13)

Keywords

IMO OP ML O1 USD RAB models model frontier run eyes output bug feature o1 booster stuff want

Sample Comments

spookthesunset • Jul 6, 2023 • View on HN

If it’s based on models, it’s only as good as the model

compumike • Mar 16, 2023 • View on HN

Curious, what would have been a better model?

eurekin • Jul 26, 2023 • View on HN

Hi, so what are you using the models for?

vidarh • Apr 30, 2023 • View on HN

I have nothing to gain from spending time testing models for you because whatever I pick will just seem like cherry picking to you, and it doesn't matter to me whether or not you agree on the usability of these models. They work for me, and that's all that matters to me. Try a a few completions instead of a question. Or don't

skp1995 • Nov 6, 2024 • View on HN

Honestly we can, I haven't prompted it enough what do you want to use the model for?

stainablesteel • Jan 17, 2024 • View on HN

so the takeaway is basically, don't run a model if you don't know where it came from

osigurdson • Mar 9, 2024 • View on HN

What is going to happen? Please publish your models so we can run them ourselves.

DoofusOfDeath • Sep 11, 2018 • View on HN

Not sure it's possible. The more accurate the models get, the buggier they are.

aaronblohowiak • Oct 25, 2025 • View on HN

what's the most effective model you've seen?

politelemon • Sep 27, 2025 • View on HN

I'm not seeing the equivalence. Isn't the announcement here to let you run any model?