Module 2 - Led by Nathan Layman

The data lakehouse and data organization

Hands-on data exploration, the Universal Data Model, and the lakehouse architecture that makes AI tools work across institutions.

Launch slide deck Back to sessions

Module brief

Quick reference for the presenter

Part 1 - 15 min

AI-Assisted EDA

Open with a live Data Crawler Carl demo. Participants upload a CSV, ask questions in plain English, and observe where the AI gets it right and where it fails. Debrief surfaces data quality issues: missing values, inconsistent answers, false confidence.

Part 2 - 20 min

Data Models and the UDM

The "most important model" quiz reveals that data models matter more than AI models. Introduce the Universal Data Model, walk through its domains and naming conventions, then show how AI can serve as a universal adapter for schema mapping.

Part 3 - 15 min

The Data Lakehouse

Compare four approaches to institutional data (warehouse, lake, lakehouse, swamp). Walk through medallion layers with a worked example tracing a field from raw source to application view. Close with what well-organized data makes possible.

Learning objectives

What participants should leave able to do

Experience AI-assisted exploratory data analysis firsthand
Explain why a shared data model matters more than any AI model
Describe how the medallion architecture organizes institutional data

Facilitation

Speaker notes and session close

Talking points

Key messages to reinforce

AI will confidently analyze incomplete or messy data. A polished narrative can hide gaps that a manual review would catch.
The most important model is the data model, not the AI model. AI models change constantly, but well-organized data stays useful.
A Universal Data Model lets you build a tool once and deploy it at any institution that maps to the same schema.
The medallion architecture (Bronze, Silver, Gold, Platinum) turns raw data into governed, queryable views without losing the original.
AI can accelerate schema mapping, but humans review and approve. The UDM works with or without AI.

Bridge forward

Handoff to Module 3

Close by asking: "We've organized the data and built the infrastructure. Now how do we know the outputs are accurate?" This hands off to Module 3 on reproducibility, evaluation, and putting it to work.