Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Item #:
075280-3114

Details

Description

 

Members/Attendees

 

Tab 4