This event has passed.

Why do small language models underperform?

Name: Why do small language models underperform?
Start: 2024-05-02T11:00:00-04:00
End: 2024-05-02T12:00:00-04:00
Location: Bldg: Nguyen Engineering Bldg., Conference Room 4201, 4400 University Drive , Fairfax, Virginia, United States, 22030, Virtual: https://events.vtools.ieee.org/m/419402

May 2 @ 11:00 am - 12:00 pm

IEEE ComSoc Norther Virginia chapter and GMU Department of Computer Science invites you to attend the following Distinguished Lecture:
Title: Why do small language models underperform?
Speaker: Benoît Sagot, Director of Research at INRIA
Date: May 2, 2024
Time: 11:00am – 12:00pm
In person Location: GMU Fairfax campus, Nguyen Engineering Bldg., Conference Room 4201
Virtual: Microsoft Teams: (https://nam11.safelinks.protection.outlook.com/ap/t-59584e83/?url=https%3A%2F%2Fteams.microsoft.com%2Fl%2Fmeetup-join%2F19%253ameeting_OGUwZGI0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi%2540thread.v2%2F0%3Fcontext%3D%257b%2522Tid%2522%253a%25229e857255-df57-4c47-a0c0-0546460380cb%2522%252c%2522Oid%2522%253a%2522f9586db0-74ee-4635-a01b-2383b74f8a0c%2522%257d&data=05%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638501649141618849%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=MUjIg2xib2wJ22N21llS60nE%2FSTEPpqg%2FSVOdLXJFHA%3D&reserved=0) Meeting ID: 292 789 339 112 Passcode: jM8w7c
—————————————————————
Dial-in by phone
(tel:+15713972084,,218888141#) United States, Arlington
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdialin.teams.microsoft.com%2F9424c9fe-3b57-41d6-9131-1d3b9b7cf4a9%3Fid%3D218888141&data=05%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638501649141626316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=r6YY83JURqPLJf9J2QcdQ5H6w3QBZTlOs4HAUUBvvSQ%3D&reserved=0)
Phone conference ID: 218 888 141#
For organizers: (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fteams.microsoft.com%2FmeetingOptions%2F%3ForganizerId%3Df9586db0-74ee-4635-a01b-2383b74f8a0c%26tenantId%3D9e857255-df57-4c47-a0c0-0546460380cb%26threadId%3D19_meeting_OGUwZGI0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi%40thread.v2%26messageId%3D0%26language%3Den-US&data=05%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638501649141633701%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=KSJGsLYD3jKk7GBpn2XuSPetsCGuvQn9I6sJYXPZfs0%3D&reserved=0) | (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdialin.teams.microsoft.com%2Fusp%2Fpstnconferencing&data=05%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638501649141641487%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2ONDG7yjY3RckniAVPkRkliPHOBWdNqIhM2RgAA8O10%3D&reserved=0)
Abstract:
Language models, and in particular generative and conversational language models, are at the heart of recent advances in natural language processing (NLP). Understanding how these models represent textual content and how they learn these representations still raises multiple research questions. In this talk, I will start from an observation that small models are less efficient than expected. I will show that language models relying on the Transformer architecture tend to produce vector representations that are not isotropically distributed in space. This anisotropy is linked to the way in which these models are learned, which leads to the frequency of the tokens taking a preponderant place in their representation. I will show that this effect has negative consequences on the ability of small models to train satisfactorily (“performance saturation”) but does not seem to affect larger models. I will then describe a new approach for training language models intended to avoid the undesirable effects of this prevalence of frequency information. The resulting “headless” models display a number of advantages over standard models, including on downstream performance.
Bio:
Benoît Sagot is a computer scientist specialized in natural language processing (NLP). He is a Senior Researcher (Directeur de Recherches) at INRIA, where is heads the INRIA research project ALMAnaCH in Paris, France. He also holds a chair in the PRAIRIE institute dedicated to artificial intelligence, and currently holds the annual chair for computer science in the Collège de France. His research focuses on language modelling, machine translation, language resource development and computational linguistics, with a focus on French in all its form and on less-resourced languages.
________________________________________________________________________________
Co-sponsored by: GMU Department of Computer Science
Bldg: Nguyen Engineering Bldg., Conference Room 4201, 4400 University Drive , Fairfax, Virginia, United States, 22030, Virtual: https://events.vtools.ieee.org/m/419402

Details

Date:: May 2
Time:: 11:00 am - 12:00 pm
Event Category:: Northern Virginia
Website:: https://events.vtools.ieee.org/m/419402

Venue

: Bldg: Nguyen Engineering Bldg., Conference Room 4201, 4400 University Drive , Fairfax, Virginia, United States, 22030, Virtual: https://events.vtools.ieee.org/m/419402

IEEE Northern VA Section

Why do small language models underperform?

May 2 @ 11:00 am - 12:00 pm

Details

Venue

Related Events

Spring Picnic for the IEEE Northern Virginia Section

Re-envisioning Direct Heat-to-Electricity Conversion with Additive Manufacturing – Prof. Saniya Leblanc