Presented by

Dr. Emma Beauxis-Aussalet, Sarah-Jane van Els & Triveni Gandhi

About this talk

As we saw in episode 1 of this series, the bias inherent in historical data is often not correctable by simply collecting more or more representative data. If nobody from a certain group has ever applied for this kind of loan or that type of job, there may simply be no data to collect. If we accept defeat on this, there is a real risk AI models will refuse to make predictions on these groups with missing data, reinforcing the problem that got us here in the first place. One solution with promise is synthetic data, generated by combining the data of real cases to produce anonymised cases with properties that match the underlying population, “filling in the gaps” in historical data. In this session, we discuss a concrete use case developed by the ICAI lab in collaboration with Randstad and explore the promise and limits of this approach. Speaker bios: Dr. Emma Beauxis-Aussalet is an assistant professor of ethical computing at the Vrije Universiteit Amsterdam (VU). She is also lab manager of the Civic AI Lab. In 2019 Emma obtained her doctorate at Utrecht University with a dissertation on AI bias, for her work at the Centrum Wiskunde & Informatica (CWI). With her multidisciplinary experience, she has been researching computational methods, statistics, user interfaces and data visualizations that enable transparent and controllable AI systems. Modelling and visualizing AI errors is one of her main research topics. For her achievements in this field, she was named one of the 100 Brilliant Women in AI Ethics in 2021. She also received the 3rd WomENcourage Prize for her contributions to the development of AI literacy and bias awareness in lectures and workshops. Sarah-Jane is a recent MSc Information Sciences graduate with a BSc in Business Administration from the Vrije Universiteit Amsterdam. She conducted her master thesis at Randstad Groep Nederland, researching synthetic data to identify bias in recommender systems for recruitment.

Deep Dive into Synthetic Data Generation for Bias Mitigation

Presented by

About this talk

Dataiku