AI4RA Workshop | REACH 2026

Data Governance as the Foundation for Context Engineering

A facilitator-ready module for research analytics and administration teams deciding what information AI should be allowed to trust.

Module goal

Participants should leave with one durable idea

Context engineering is not mainly about giving the model more information. It is about deciding which information the institution can safely trust, use, expose, and act on.

Provenance Permissions Quality Stewardship Review boundaries

Opening move

Start with the governance work they already do

Already familiar

Dashboard governance

Which source is authoritative for a metric?
Who can access which dashboard?
How often is the data refreshed?
What happens when a source changes mid-cycle?

Bridge to AI

Same muscles, higher stakes

AI governance uses the same institutional judgment, but the system can now synthesize, classify, extract, and route work, not just display information on a screen.

A useful opening line: retrieval quality is downstream from governance quality.

Definition

What do we mean by context engineering?

Design what the model can see

Policies, files, tables, forms, notes, prompts, and tools.

Design what the model can use

Which sources are authoritative enough to influence the answer.

Design what the model should rely on

What must be cited, refreshed, validated, or constrained.

Design when the model must stop

Abstain, defer, or escalate when the workflow leaves safe bounds.

Core thesis

Better context is only useful if it is governed

What teams often do

Add more documents, more retrieval, more system access, and more files in hopes that the answer becomes better.

What can actually happen

If the data is outdated, poorly permissioned, ambiguous, or untraceable, the answer may sound more confident while becoming less safe and less institutionally valid.

Ungoverned context increases the error surface area. It does not just increase capability.

Key distinction

AI consumes governed data and generates new governed data

Inputs

What the workflow reads

Policy PDFs
Tables and spreadsheets
Shared-drive forms
Dashboard sources
Operational guidance documents

Outputs

What the workflow may create

Extracted fields from PDFs
Tags and classifications
Structured summaries
Routing metadata
Draft operational records

Once those outputs feed a queue, dashboard, report, or decision, governance applies to them too.

Governance lens

Every new context source should trigger these questions

Provenance

Where did this information come from, and can we prove it?

Permissions

Who is allowed to access it, transform it, or expose it?

Quality

How current, complete, reliable, and explainable is it?

Sensitivity

Does it contain private, regulated, or institutionally sensitive material?

Stewardship

Who owns refresh, correction, and validation over time?

Human review

When should the system answer, cite and defer, or escalate?

Research analytics lens

You already know this discipline from dashboards and reports

Dashboard governance asks:

Which number is authoritative? What is the refresh cadence? Who has access? What does this metric mean?

AI governance asks:

Which source should influence the answer? What should be cited back? Who can see it? When must a human validate the output?

The discipline is familiar. The difference is that AI can act on the data and create new data for downstream use.

Worked example

A proposal routing assistant with mixed sources

Scenario

A unit wants an AI assistant to answer questions about proposal routing deadlines and internal approvals.

The team connects a current policy PDF, old shared-drive forms, exception-heavy email guidance, and a few notes from experienced staff.

Teaching point

More context is not automatically better context

Those sources do not carry the same authority, freshness, or interpretive weight. The model may blend them into a plausible answer that no one should trust without review.

Source map

Show participants how source quality diverges inside one workflow

Current policy PDF

Likely authoritative

Good candidate for grounding if the version is current and owned.

Shared-drive forms

Possibly stale

Useful reference, but dangerous if staff do not know which copy is current.

Email guidance

Exception-heavy

May reflect edge cases that should not be generalized into a default answer.

Staff notes

Local practice

Helpful for context, but often hard to defend as institutional policy.

Failure mode

Where governance enters the scenario

Which source is authoritative, and who says so?
Which sources are supplemental or historical only?
Who is responsible for refreshing the material?

What should the system cite back to the user?
When should the answer escalate to a human?
What should never be completed without review?

A trustworthy system does not just answer well. It knows when not to answer alone.

Generated outputs

Extraction creates governed data too

If the assistant extracts deadlines, approver names, sponsor requirements, or routing codes into a structured table, that output starts to look like clean operational data.

New governance questions

Who validates the extracted fields?
Who owns the resulting table?
Can it be reused in dashboards or workflow routing?
What audit trail ties it back to the source?

Decision aid

Use a fast context readiness checklist

Ready now

Authoritative, current, permissioned, explainable, and clearly owned.

Needs cleanup

Potentially useful, but freshness, ownership, access, or provenance is still unclear.

Keep out

Too sensitive, too ambiguous, or too unreliable for operational use.

Ask before acting

Would you trust this source in a dashboard, audit trail, or official workflow decision?

Activity

Run a 4-minute readiness audit in the room

Choose one local source your team wants to use in an AI workflow.
Mark whether it is authoritative, current, permissioned, and explainable.
Name the biggest failure mode if the system uses it uncritically.
Decide: ready now, needs cleanup, or keep it out of the workflow.

If time allows, ask two people to share why they made different decisions about source readiness.

Facilitator note

Do not frame governance as bureaucracy

Better framing

Governance is the condition that makes automation safe, inspectable, and institutionally legitimate.

Language worth repeating

Trustworthy context
Stewardship before scale
Human review as part of the design
Authority before automation

Discussion

Questions to use with participants

Which local data sources are reliable enough to become AI context?
Which ones are too messy, sensitive, or incomplete to trust yet?
Where could AI-generated structured data enter your analytics or operational workflows?
Who currently defines what is authoritative in your workflow?
What governance work has to happen before retrieval or automation expands?

Continue the conversation

REACH sessions that extend these ideas

G2

Operationalizing Data Governance
Tue 11:15 AM, WEATHERLY

G5

Ethical and Epistemic Foundations
Tue 11:15 AM, NEWPORT

I4

Building Data Literacy as a Shared Language
Tue 2:30 PM, ENTERPRISE

H3

Automating Your Data Dictionary
Tue 1:30 PM, COLUMBIA

Takeaway

Governance decides what context belongs in the system

Trustworthy AI workflows depend less on how much context a model can access and more on whether the institution knows what it is allowed to trust, retrieve, expose, and act on.

Module assets

Use the full module and checklist after the session

Bridge forward

Once governance is clear, the technical layers make sense

Module 3 builds from this foundation by showing how prompts, files, tools, retrieval, and human escalation work together once the institution knows what belongs in the system.