Clinical Development Data Foundation Layered Panorama

2026-03-01

A layered architecture from source facts to auditable apps and agents.

Extract -> Clean -> Conform -> Canonical -> Productize -> Operationalize.

Backbone Transformation Chain

Extract (Traceable)

Clean (Usable)

Conform (Composable)

Canonical (Shared Semantics)

Productize (Reusable)

Operationalize (Workflow)

Layer 0 — Source Systems

What this layer looks like

CTMS/eTMF/RTSM/PV/QMS, EDC/ePRO/Labs/Imaging, protocol/CSR documents, RWD/RWE and external registries.

Inputs from below

No upstream. This is where facts originate.

Transform to next layer

Connectors + extraction (API/SFTP/CDC/events/files) -> raw extract package + source lineage metadata.

Common pitfall

Collecting data without extraction evidence chain makes every downstream layer unverifiable.

Layer 1 — Landing / Raw

What this layer looks like

Immutable raw zone partitioned by source and time, with hashes, row checks, PHI tags, and jurisdiction labels.

Inputs from below

Raw extracts and extraction metadata from Layer 0.

Transform to next layer

Light normalization (decode, dedupe, type/timezone harmonization, key prep) -> cleaned raw / bronze.

Common pitfall

Over-cleaning in Raw rewrites facts and breaks audit explainability.

Layer 2 — Conformed

What this layer looks like

Quality-controlled subject/visit/site/case/document objects with rule outcomes and anomaly markers.

Inputs from below

Cleaned raw objects from Layer 1.

Transform to next layer

Terminology mapping + standards mapping + master ID alignment -> conformed datasets and entity links.

Common pitfall

Field-level harmonization without entity-level ID unification kills cross-system analytics.

Layer 3 — Semantic / Canonical

What this layer looks like

Canonical model for clinical, operations, safety, and document cores plus governed metric definitions.

Inputs from below

Conformed datasets, mapped dictionaries, and entity links from Layer 2.

Transform to next layer

Dimensional modeling + feature engineering + document structuring -> semantic marts and feature artifacts.

Common pitfall

No metric-governance means each team computes “same KPI” differently.

Layer 4 — Analytical / Data Products

What this layer looks like

Trial performance marts, patient journey marts, safety marts, authoring marts, feature/metric stores.

Inputs from below

Canonical semantic assets from Layer 3.

Transform to next layer

Serving adaptation (indexes/cache/vector/graph) + role-based access + service APIs -> reusable products.

Common pitfall

Data exists but is not productized, so every app rebuilds its own data layer.

Layer 5 — Apps / Agents

What this layer looks like

Dashboards, decision apps, and agent workflows for authoring, ops, and quality intervention loops.

Inputs from below

Semantic consistency and reusable products from Layer 3/4.

Transform to next layer

Evidence-backed decisions + explainable outputs + optional governed write-back into operational systems.

Common pitfall

Without evidence traceability, LLM/agent outcomes cannot earn GxP trust.

Three Cross-Cutting Capabilities

Identity Resolution

Unify Study/Site/Subject/Investigator/Vendor identities across systems.

Lineage & Audit

Every metric and conclusion should trace back to source versions and rules.

Governed Access

Role-based views, PHI/PII controls, and regional sovereignty constraints.

“The foundation is done when every conclusion has an evidence path.”