RLHF in LLMs

The cluster focuses on discussions attributing specific behaviors, personalities, and response styles in large language models to Reinforcement Learning from Human Feedback (RLHF), distinguishing it from base model pre-training.

➡️ Stable 0.9x AI & Machine Learning

3,102

Comments

Years Active

Top Authors

#8076

Topic ID

Activity Over Time

2017

2018

2019

2020

2021

2022

115

2023

1,195

2024

642

2025

1,065

2026

Top Contributors

visarga (64) famouswaffles (39) HarHarVeryFunny (28) astrange (21) gwern (21)

Keywords

CTGT LLM openai.com RLFH C.A P.I AI HN RL RLMF model tuning feedback reinforcement instruction training llms tuned fine tuning models

Sample Comments

drdeca • May 21, 2023 • View on HN

Are you taking the RLHF into account when you say so?

SheinhardtWigCo • Dec 7, 2023 • View on HN

RLHF is probably the reason for this.

VivaLaPanda • Feb 26, 2024 • View on HN

It's almost certainly the RLHF, not the base model.

whimsicalism • Apr 12, 2023 • View on HN

Read about RLHF, i think you are misunderstanding what this will be used for.

piperswe • Jun 5, 2025 • View on HN

I think it's part of the RLHF tuning as well

zarzavat • Mar 14, 2023 • View on HN

He’s talking about RLHF - reinforcement learning with human feedback (the process that trained ChatGPT), the training data for which is not publicly available.And the point is that you don’t need to RLHF as long as you have access to another model that has been trained with RLHF that you can blackbox.

7moritz7 • Nov 10, 2025 • View on HN

Hasn't RLHF and with LLM feedback been around for years now

visarga • May 12, 2023 • View on HN

RLHF your AI until you like the output?

sweezyjeezy • Sep 5, 2023 • View on HN

That's probably more because of RLHF though, they've optimised for certain kind of responses rather than simple model loss on internet text.

drsim • Feb 26, 2025 • View on HN

It is RLHF if I understand correctly.