LLM Benchmark Comparisons

The cluster focuses on debates about the performance of new language models compared to state-of-the-art ones like GPT-4, Llama, and Mistral, questioning missing benchmarks, overfitting, and real-world validity.

➡️ Stable 1.0x AI & Machine Learning

4,463

Comments

Years Active

Top Authors

#9732

Topic ID

Activity Over Time

2015

2016

2017

2018

2019

2020

2021

2022

2023

1,400

2024

1,319

2025

1,422

2026

Top Contributors

brucethemoose2 (59) simonw (30) moffkalast (28) famouswaffles (27) ilaksh (26)

Keywords

CPU CBRN MythoMax LLM BLOOMZ AI GP stanford.edu GPT4 AIME models benchmarks gpt openai llama model open chatgpt models like grok

Sample Comments

yousif_123123 • Dec 11, 2025 • View on HN

Why doesn't OpenAI include comparisons to other models anymore?

aubanel • Mar 17, 2025 • View on HN

No it's not: their model is only on GPT-4.5 level on a few, saturated/cherry-picked benchmarks.

prime312 • Aug 12, 2025 • View on HN

Any reason why open-source models (like Llama) weren't considered here?

robrenaud • Apr 8, 2024 • View on HN

Model quality matters a ton too. They aren't serving OpenAI or Anthropic models, which are state of the art.

transformi • Sep 27, 2023 • View on HN

But why they didn't compare it to SOTA finetuned...(like vicuna playtus..)? ... smells a bit strange..

syntaxing • Aug 5, 2025 • View on HN

Interesting, these models are better than the new Qwen releases?

dcreater • Sep 5, 2025 • View on HN

Thank you! Why are the comparisons to llama3.1 era models?

saberience • Mar 12, 2025 • View on HN

Seems like its tuned for benchmarks for me, as in, real world it seems worse than Mistral and Llama.

Tepix • Mar 31, 2023 • View on HN

Have you tried bigger models? Llama-65B can indeed compete with GPT-3 according to various benchmarks. The next thing would be to get the fine-tuning as good as OpenAI's.

wanderingmind • May 5, 2024 • View on HN

How does it compare to GGML? That I'd what they must be comparing and yet I don't see any comparison made