How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers

Item #:
079017-2774

Details

Description

 

Members/Attendees

 

Tab 4