Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Item #:
079017-2965

Details

Description

 

Members/Attendees

 

Tab 4