Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Item #:
079017-0288

Details

Description

 

Members/Attendees

 

Tab 4