Ongoing

IEEE Cincinnati December Social Event (Special date & time)

Bldg: Meritage-Cincinnati, 40 Village Square, Cincinnati, Ohio, United States, 45246

IEEE Social Event includes food and beverage in the private meeting room of Meritage-Cincinnati. Agenda: Entrees Scallops with Lemon Butter Pan seared, served over a bed of rice with -roasted carrots Thai Glazed Salmon Grilled, served over a bed of rice with the daily vegetable Daily Fish with Crab Butter Pan seared, over a bed of rice with oven-roasted carrots Garlic Shrimp Pesto Pasta With sun-dried tomatoes, spinach pesto and roasted garlic served over linguini Chicken Diavolo Pasta Grilled chicken, mushrooms, red peppers, tomatoes, spinach tossed in a creamy diavolo sauce Mt. Carmel Glazed Lamb Chops Served over sautéed spinach with macaroni and cheese Filet Mignon Served with a twice baked potato New York Strip Topped with Demi-Glaze Topped with onion straws and served with a loaded baked potato Crab Cakes Topped with chipotle sour cream, served with brussel sprouts tossed in Caesar dressing, topped with pancetta Beer & Wine included Cost is $30 for members. Members may bring their significant other for a cost of $30 ($60 per couple) Bldg: Meritage-Cincinnati, 40 Village Square, Cincinnati, Ohio, United States, 45246

Cleveland IEEE December Holiday Gathering

7380 State Road Cleveland

IEEE members are encouraged to bring a guest Location: Stancato's 7380 State Road Parma, Ohio 44134 Agenda: eat and drink and talk 7380 State Road, Parma, Ohio, United States

UMD Center for Machine Learning Visiting Talk

Room: 5105, Bldg: Brendan Iribe Center for Computer Science and Engineering, 8125 Paint Branch Dr, College Park, Maryland, United States, 20740

TALK: "From Filtering to Fingerprints: Constructing Pretraining Datasets for LLMs and Measuring Biases in the Data" VISITING SPEAKER: (https://n64lfj4ab.cc.rs6.net/tn.jsp?f=001J3H0PVaGqMWV08isURJ3iOT_XJ8STTrSgdX8UUsy2-vFgfli0aZQg3Pi1LL9g2PWaBzJGMsEhy1OApDf2eKLjLS8caqMEJiR1CRS9gyAgrxdGXWjW0mW2uXFlykK-5bRimHpINmMnuXwBDIDEa2PmmK9cnYrJsrF&c=oPKUxcWmtgc-Id8lnSU74HxZvh1vnZUBjSIvOambXJBoJClnrTi5gQ==&ch=InJBzzwmlc0zBxUTWJDASwbYGTZ6tUEUtBhMTOiqHonzieo8d1tkBQ==), an associate professor of machine learning at the Technical University of Munich WHEN: Friday, December 6, 2024 at 11 a.m. LOCATION: 5105 Iribe Center, University of Maryland ABSTRACT: In this talk, we first discuss how pre-trained datasets for LLMs are sourced from the web through heuristic and machine learning based filtering techniques. We then investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining text datasets derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb and others. Despite those datasets being obtained with similar filtering and deduplication steps, LLMs can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that popular pretraining datasets have their own unique biases or fingerprints. BIO: (https://n64lfj4ab.cc.rs6.net/tn.jsp?f=001J3H0PVaGqMWV08isURJ3iOT_XJ8STTrSgdX8UUsy2-vFgfli0aZQg3Pi1LL9g2PWaBzJGMsEhy1OApDf2eKLjLS8caqMEJiR1CRS9gyAgrxdGXWjW0mW2uXFlykK-5bRimHpINmMnuXwBDIDEa2PmmK9cnYrJsrF&c=oPKUxcWmtgc-Id8lnSU74HxZvh1vnZUBjSIvOambXJBoJClnrTi5gQ==&ch=InJBzzwmlc0zBxUTWJDASwbYGTZ6tUEUtBhMTOiqHonzieo8d1tkBQ==) is an associate professor of machine learning in the Department of Computer Engineering at the Technical University of Munich, and an adjunct faculty member at Rice University. From 2017–2019, he was an assistant professor of electrical and computer engineering at Rice University. Before that, Heckel was a postdoctoral researcher in the Berkeley Artificial Intelligence Research Lab at UC Berkeley and a researcher at IBM Research Zurich. He completed his Ph.D. in 2014 at ETH Zurich and was a visiting Ph.D. student at Stanford University's Statistics Department. Heckel's work focuses on machine learning, artificial intelligence, and information processing. He specializes in developing algorithms and foundations for deep learning, particularly for medical imaging, establishing mathematical and empirical underpinnings for machine learning, and utilizing DNA as a digital information technology. Co-sponsored by: UMD Center for Machine Learning Speaker(s): , Reinhard Heckel Room: 5105, Bldg: Brendan Iribe Center for Computer Science and Engineering, 8125 Paint Branch Dr, College Park, Maryland, United States, 20740