How’s RLHF different from RL setup
- No state transition happen, generation of one state does not affect another.
- Switching from a reward function to a reward model. reward model could be any classification model.
How’s RLHF different from RL setup