행위

Reinforcement Learning from Human Feedback

라이언의 꿀팁백과

A short introduction to RLHF and post-training focused on language models by Nathan Lambert


https://rlhfbook.com/