Deep Dive into Synthetic Data Generation for Bias Mitigation

Logo
Presented by

Dr. Emma Beauxis-Aussalet, Sarah-Jane van Els & Triveni Gandhi

About this talk

As we saw in episode 1 of this series, the bias inherent in historical data is often not correctable by simply collecting more or more representative data. If nobody from a certain group has ever applied for this kind of loan or that type of job, there may simply be no data to collect. If we accept defeat on this, there is a real risk AI models will refuse to make predictions on these groups with missing data, reinforcing the problem that got us here in the first place. One solution with promise is synthetic data, generated by combining the data of real cases to produce anonymised cases with properties that match the underlying population, “filling in the gaps” in historical data. In this session, we discuss a concrete use case developed by the ICAI lab in collaboration with Randstad and explore the promise and limits of this approach. Speaker bios: Dr. Emma Beauxis-Aussalet is an assistant professor of ethical computing at the Vrije Universiteit Amsterdam (VU). She is also lab manager of the Civic AI Lab. In 2019 Emma obtained her doctorate at Utrecht University with a dissertation on AI bias, for her work at the Centrum Wiskunde & Informatica (CWI). With her multidisciplinary experience, she has been researching computational methods, statistics, user interfaces and data visualizations that enable transparent and controllable AI systems. Modelling and visualizing AI errors is one of her main research topics. For her achievements in this field, she was named one of the 100 Brilliant Women in AI Ethics in 2021. She also received the 3rd WomENcourage Prize for her contributions to the development of AI literacy and bias awareness in lectures and workshops. Sarah-Jane is a recent MSc Information Sciences graduate with a BSc in Business Administration from the Vrije Universiteit Amsterdam. She conducted her master thesis at Randstad Groep Nederland, researching synthetic data to identify bias in recommender systems for recruitment.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (270)
Subscribers (57057)
Dataiku is the platform for Everyday AI, enabling data experts and domain experts to work together to build data into their daily operations, from advanced analytics to Generative AI. Together, they design, develop and deploy new AI capabilities, at all scales and in all industries. Organizations that use Dataiku enable their people to be extraordinary, creating the AI that will power their company into the future. More than 600 companies worldwide use Dataiku, driving diverse use cases from predictive maintenance and supply chain optimization, to quality control in precision engineering, to marketing optimization, Generative AI use cases, and everything in between.