InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling

Item #:
079017-4270

Details

Description

 

Members/Attendees

 

Tab 4