rl1

How’s RLHF different from RL setup

  • No state transition happen, generation of one state does not affect another.
  • Switching from a reward function to a reward model. reward model could be any classification model.

rl2