Presented by

Amanda Milberg, Data Scientist @ Dataiku

About this talk

Many firms have a large document corpus made up of both digitized and raw images. Now more than ever, financial institutions are turning towards unstructured data sources to capture additional attributes in order to, ultimately, adjust or confirm their analyses and discover new trends and insights. Many organizations rely on individuals to read sections of these documents or search for relevant materials in an ad hoc manner, with no systematic way of categorizing and understanding the information and trends. Join us for this Dataiku session on interactive document intelligence, where we will showcase a modular and reusable pipeline to rapidly and automatically digitize documents, extract text, and consolidate data into a unified and searchable database. We will focus on NLP techniques applied to prepare, categorize, and analyze textual data based on themes of interest (in this project: ESG), with additional theme modules available. Lastly, we will demo a purpose-built dashboard to provide business users with a simple and interactive tool to analyze high-level trends and drill down into aggregated insights.