Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Item #:

075280-3114

Details

DOI: https://doi.org/10.52202/075280-3114
Author(s): Alexandre Rame, Guillaume Couairon, Corentin Dancette, Jean-Baptiste Gaya, Mustafa Shukor, Laure Soulier, Matthieu Cord
Pages: 71095-71134 (40 pages)
Format: PDF Paper Download
Conference: Advances in Neural Information Processing Systems 36
Date/Location: Held 10-16 December 2023, New Orleans, Louisiana, USA.
Series: Advances in Neural Information Processing Systems 36
Publisher: Neural Information Processing Systems Foundation, Inc. (NeurIPS)

DOI: https://doi.org/10.52202/075280-3114
Author(s): Alexandre Rame, Guillaume Couairon, Corentin Dancette, Jean-Baptiste Gaya, Mustafa Shukor, Laure Soulier, Matthieu Cord
Pages: 71095-71134 (40 pages)
Format: PDF Paper Download
Conference: Advances in Neural Information Processing Systems 36
Date/Location: Held 10-16 December 2023, New Orleans, Louisiana, USA.
Series: Advances in Neural Information Processing Systems 36
Publisher: Neural Information Processing Systems Foundation, Inc. (NeurIPS)