Clinical Development Data Foundation Layered Panorama
2026-03-01
A layered architecture from source facts to auditable apps and agents.
Backbone Transformation Chain
Extract (Traceable)
Clean (Usable)
Conform (Composable)
Canonical (Shared Semantics)
Productize (Reusable)
Operationalize (Workflow)
L0
Layer 0 — Source Systems
What this layer looks like
CTMS/eTMF/RTSM/PV/QMS, EDC/ePRO/Labs/Imaging, protocol/CSR documents, RWD/RWE and external registries.
Inputs from below
No upstream. This is where facts originate.
Transform to next layer
Connectors + extraction (API/SFTP/CDC/events/files) -> raw extract package + source lineage metadata.
Common pitfall
Collecting data without extraction evidence chain makes every downstream layer unverifiable.
L1
Layer 1 — Landing / Raw
What this layer looks like
Immutable raw zone partitioned by source and time, with hashes, row checks, PHI tags, and jurisdiction labels.
Inputs from below
Raw extracts and extraction metadata from Layer 0.
Transform to next layer
Light normalization (decode, dedupe, type/timezone harmonization, key prep) -> cleaned raw / bronze.
Common pitfall
Over-cleaning in Raw rewrites facts and breaks audit explainability.
L2
Layer 2 — Conformed
What this layer looks like
Quality-controlled subject/visit/site/case/document objects with rule outcomes and anomaly markers.
Inputs from below
Cleaned raw objects from Layer 1.
Transform to next layer
Terminology mapping + standards mapping + master ID alignment -> conformed datasets and entity links.
Common pitfall
Field-level harmonization without entity-level ID unification kills cross-system analytics.
L3
Layer 3 — Semantic / Canonical
What this layer looks like
Canonical model for clinical, operations, safety, and document cores plus governed metric definitions.
Inputs from below
Conformed datasets, mapped dictionaries, and entity links from Layer 2.
Transform to next layer
Dimensional modeling + feature engineering + document structuring -> semantic marts and feature artifacts.
Common pitfall
No metric-governance means each team computes “same KPI” differently.
L4
Layer 4 — Analytical / Data Products
What this layer looks like
Trial performance marts, patient journey marts, safety marts, authoring marts, feature/metric stores.
Inputs from below
Canonical semantic assets from Layer 3.
Transform to next layer
Serving adaptation (indexes/cache/vector/graph) + role-based access + service APIs -> reusable products.
Common pitfall
Data exists but is not productized, so every app rebuilds its own data layer.
L5
Layer 5 — Apps / Agents
What this layer looks like
Dashboards, decision apps, and agent workflows for authoring, ops, and quality intervention loops.
Inputs from below
Semantic consistency and reusable products from Layer 3/4.
Transform to next layer
Evidence-backed decisions + explainable outputs + optional governed write-back into operational systems.
Common pitfall
Without evidence traceability, LLM/agent outcomes cannot earn GxP trust.
Three Cross-Cutting Capabilities
Identity Resolution
Unify Study/Site/Subject/Investigator/Vendor identities across systems.
Lineage & Audit
Every metric and conclusion should trace back to source versions and rules.
Governed Access
Role-based views, PHI/PII controls, and regional sovereignty constraints.
“The foundation is done when every conclusion has an evidence path.”