Module 2

Data governance is the foundation for context engineering.

Context engineering is not just about giving an AI model more information. It is about deciding which information can be used, where it came from, how much it can be trusted, who is allowed to see it, and what should happen to it after the interaction.

Module brief

Teach participants to evaluate context before they trust it.

Learning goal

Evaluate context before using it

Participants should leave able to explain why provenance, permissions, quality, and stewardship determine whether context is safe enough to use in an AI workflow.

In-room move

Open with a dashboard analogy

Start from the governance work they already do for dashboards, then show that AI context uses the same institutional muscles under higher stakes.

Key distinction

AI both consumes data and generates new data

The system may read governed inputs such as policy documents, spreadsheets, and dashboard tables, but it can also produce new structured outputs such as extracted fields, classifications, summaries, and routing data that enter downstream workflows.

Participant artifact

A context readiness checklist

This module should leave people with a short set of questions they can use to judge whether a document set, policy source, or dataset is ready for AI-assisted work.

Derived assets

Governance slide deck and checklist handout

The hosted governance deck and the context readiness checklist should summarize the framing, example, and exercise on this page rather than introduce new content that only exists in slides.

Lecture framing

A simple way to open the topic

You can introduce this section with a blunt contrast: most people think context engineering is about giving the model more useful information, but institutions actually succeed or fail here based on whether they govern that information well. Retrieval quality is downstream from governance quality.

Core teaching arc

Better context is only valuable if it is governed.

Module explanation

More context can increase risk, not just value

Teams often talk about adding more documents, more system access, or richer retrieval pipelines to improve AI output. But if the underlying data is outdated, poorly permissioned, ambiguous, or untraceable, the added context can make the answer more dangerous, not more useful.

Generated data

Governance applies to outputs as well as inputs

AI systems do not only consume institutional data. They can also generate new data products: extracted fields from PDFs, tagged records, structured summaries, classifications, and draft workflow metadata. Once those outputs feed a dashboard, a queue, a report, or a routing decision, they become governed data too.

Research analytics lens

Why this matters if you already govern dashboard data

Research analytics teams already decide which source is authoritative for a metric, how to version a KPI definition, who can see which dashboard, and what happens when a source system changes mid-reporting cycle. AI governance is the same discipline applied to a new output.

Teaching takeaway

Governance should shape what enters the system

A strong workshop message is that context engineering is not only about retrieval quality. It is also about stewardship: permissions, quality standards, curation, traceability, and clearly defined review boundaries.

Decision aid

What every institution should ask before adding context

  • Where did this information come from, and can we prove its provenance?
  • Who is allowed to access it, and under what conditions?
  • How current, complete, and reliable is it?
  • Does it contain private, regulated, or institutionally sensitive material?
  • What human review is needed before the output is used operationally?
Suggested teaching flow

A sequence for presenting the idea

  1. Define context engineering as the design of what the model can see, use, and rely on.
  2. Explain that every added source of context introduces governance questions, not just technical opportunity.
  3. Show that AI can also generate new structured data, not just read existing data.
  4. Show how bad governance creates confident but unsafe or institutionally invalid output.
  5. Connect governance to provenance, permissions, quality, stewardship, and review boundaries for both inputs and generated outputs.
  6. End by reframing governance as the operating system behind trustworthy AI workflows.

Example and activity

Use a realistic institutional scenario to make the governance work visible.

Worked example

A proposal routing assistant with mixed sources

Imagine an institution wants an AI assistant to help answer questions about proposal routing. The team connects policy documents, old email guidance, a shared drive of forms, and a few notes from experienced staff. The system now has more context, but not necessarily better context.

If the policy PDF is current, the shared drive is out of date, the email guidance reflects exceptions, and the notes are informal local practice, the model may blend all of that into a plausible answer that no one should trust without review.

If the system then extracts deadlines, approver names, sponsor requirements, or routing codes into a structured table, that generated table may look like clean operational data. It still needs provenance, validation, and ownership.

What to point out

Where governance enters the scenario

  • Which source is authoritative and who says so?
  • Which sources are only supplemental or historical?
  • Who is responsible for refreshing the materials?
  • What should the system cite back to the user?
  • When should the answer escalate to a human instead of completing the task?
  • If the AI extracts structured fields, who validates and owns that new dataset?
Participant activity

Run a fast context readiness audit

  1. Ask participants to choose one institutional source they would like to use in an AI workflow.
  2. Have them mark whether the source is authoritative, current, permissioned, and explainable to another staff member.
  3. Ask what could go wrong if the system used that source without review.
  4. End by deciding whether the source is ready now, needs cleanup, or should stay outside the workflow.
Discussion prompt

Questions to ask participants

  • Which data sources in your institution are reliable enough to become AI context?
  • Which ones are too messy, sensitive, or incomplete to trust yet?
  • Where could AI-generated structured data enter your analytics or operational workflows, and how would you validate it?
  • Who currently decides what is authoritative in your workflow, and is it the same person who would govern AI context?
  • What governance work has to happen before retrieval or automation should expand?

Facilitation support

Keep the discussion anchored in trust, stewardship, and review.

Speaker notes

Talking points for the presenter

  • Do not present governance as bureaucracy; present it as the condition that makes automation safe.
  • Emphasize that context can increase error surface area when institutions do not know which data is authoritative.
  • Keep returning to the phrase "trustworthy context" rather than simply "more context."
  • Make the consume-versus-generate distinction explicit: AI reads governed inputs, but it can also create new structured outputs that look authoritative if people do not stop to validate them.
  • Connect to what they already do: governing data for dashboards and reports is the same muscle as governing data for AI.
REACH sessions to highlight

Complementary sessions on Monday and Tuesday

  • G2 - Panel: Operationalizing Data Governance (Tue 11:15 AM, WEATHERLY).
  • G5 - Beyond Compliance: Ethical and Epistemic Foundations of Research Analytics and Evaluation (Tue 11:15 AM, NEWPORT).
  • I4 - Building Data Literacy as a Shared Language for Research Analytics (Tue 2:30 PM, ENTERPRISE).
  • H3 - Automating Your Data Dictionary (Tue 1:30 PM, COLUMBIA).
Bridge to future modules

How this sets up the rest of the workshop

This module should lead naturally into later sections on layers of context engineering, structured data, retrieval, and response evaluation. The through-line is simple: every technical layer becomes more useful when the institution knows what it is allowed to trust, retrieve, expose, and act on.

Derived assets

Slides and handouts for this module

Presentation asset

Slide version of this module

A Reveal.js slide outline based on this material is available for live delivery and iteration during workshop prep. It should continue to summarize the lecture framing, explanation, example, participant activity, and discussion prompts captured here.