How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Item #:
079017-3799

Details

Description

 

Members/Attendees

 

Tab 4