Redcap system structure

Below is a clear, structural mental model for how a REDCap study is organized. This framing is intentionally system-level, not UI-level, and is suitable for onboarding, SOPs, or comparison to custom EDC architectures.

1. The REDCap Project Is the Unit of a Study¶

A REDCap project typically corresponds to one research study (or one protocol).

A project defines:

The data model (forms/instruments)
The operational mode (classic vs longitudinal)
The governance model (users, roles, permissions)
The participant interaction model (surveys vs staff entry)

Everything in REDCap lives inside a project .

2. Records Represent Study Units (Usually Participants)¶

At the core of a project are records.

A record usually represents:

A participant
Occasionally a family, device, specimen, or encounter (less common)

Each record:

Has a unique record ID
Accumulates data across all forms, events, and timepoints
Is the anchor for permissions, locking, and audit logs

3. Instruments (Forms) Define the Data Schema¶

Instruments are structured data collection forms.

Characteristics:

Each instrument contains fields (variables)
Instruments can be:
Staff-entered forms
Participant-facing surveys
Both
Instruments define:
Data types
Validation rules
Branching logic
Required fields

Conceptually:

Instruments = tables

Fields = columns

Record × Instrument = a form instance

4. Longitudinal Structure: Events and Arms (Optional)¶

For longitudinal or interventional studies, REDCap adds structure:

Events¶

Represent timepoints or visits (e.g., Baseline, Month 3, Month 6)

Arms¶

Represent study groups (e.g., Control vs Treatment)

Each record can have:

Multiple events
Multiple arms
The same instrument repeated across events

This creates a matrix:

Record × Arm × Event × Instrument

5. Repeating Instruments and Repeating Events (Optional)¶

For collections like:

Adverse events
Medications
Lab panels
Hospitalizations

REDCap supports repetition:

Repeat an instrument within an event
Repeat an event itself

This approximates one-to-many relationships within an otherwise flat model.

6. Users and Roles Operate on the Project¶

Users¶

Human users (PI, coordinators, RAs)
System users (API tokens)

Roles¶

Permission bundles defining:
What instruments a user can access
Whether they can edit, export, or manage users
Whether they see identifiers

Users are assigned:

One role per project
Optionally restricted further by data access groups

7. Sites Are Modeled Using Data Access Groups (DAGs)¶

REDCap does not have “sites” as a native entity.

Instead:

Data Access Groups (DAGs) partition records
Each DAG usually represents:
A site
A clinic
A geographic region

Rules:

Users in a DAG only see records in that DAG
DAGs do not change the data model—only visibility

Conceptually:

DAGs = row-level security by site

8. Surveys Are Alternate Entry Points, Not Separate Data¶

Surveys:

Are instruments with a public or authenticated interface
Write directly into the same record data
Can be:
Standalone
Triggered by logic
Scheduled longitudinally

Important:

Surveys do not create a second dataset—they populate the same records.

9. Project Lifecycle States¶

Projects move through controlled states:

Development
Forms editable
Data considered test or pilot
Production
Structural changes restricted
Data considered study-valid
Archived
Read-only historical record

This enforces data integrity expectations.

10. Cross-Cutting Concerns (Always Present)¶

Across all structures, REDCap maintains:

Audit logs (who changed what, when)
Field-level validation
Record locking and e-signatures
Export tracking

One-Sentence Mental Model¶

A REDCap study is a project containing records (usually participants), each accumulating data from multiple instruments across events and arms, entered by users with defined roles, optionally partitioned into sites via data access groups, with surveys as alternate entry interfaces—all governed by strict audit and lifecycle controls.

Why This Matters for System Design (Given Your Work)¶

This structure explains why:

REDCap scales well for moderate complexity trials
It struggles with:
Deep relational data
Cross-study reuse
Modular schema evolution
It maps cleanly to:
CDISC ODM (with constraints)
FHIR ResearchStudy + ResearchSubject (imperfectly)

Below is a practical, standards-aware mapping between a REDCap study and the OMOP Common Data Model. I’ll be explicit about what maps cleanly , ** what requires transformation , and **where semantics are often lost unless you design for OMOP up front.

Executive Summary¶

REDCap is a ** data-collection system ; OMOP is an ** analytic data model .

Mapping is therefore ETL-driven, not structural. The goal is to translate REDCap records and forms into person-centric, standardized clinical facts with controlled vocabularies.

Core Conceptual Alignment¶

REDCap Concept	OMOP Concept	Mapping Notes
Project	Study context	OMOP doesnotstore studies natively; study metadata lives outside the CDM (or in extensions)
Record (participant)	PERSON	One REDCap record → one OMOP person
Site (DAG)	CARE_SITE/LOCATION	DAG → care_site_id; sometimes organization hierarchy
Form / Instrument	Domain tables	Each form decomposes into multiple OMOP domains
Field / Variable	Concept + value	Requires vocabulary mapping (SNOMED, LOINC, RxNorm)
Event / Visit	VISIT_OCCURRENCE	REDCap events are often protocol-driven, not care-driven
Repeating forms	Multiple rows	Natural fit for OMOP’s row-based domains
Survey response	OBSERVATION/MEASUREMENT	Depends on whether it is qualitative or quantitative

Canonical Table-Level Mapping¶

1. Participant Identity¶

REDCap	OMOP
record_id	person_id
DOB, sex, race	PERSONfields
Enrollment date	OftenOBSERVATION_PERIODstart

Key design choice:

REDCap record IDs should be surrogate keys, not reused identifiers.

2. Visits, Time, and Longitudinality¶

REDCap	OMOP
Event (Baseline, Month 6)	VISIT_OCCURRENCE
Arm	visit_source_valueor custom extension
Event date	visit_start_date

Caution:

REDCap events represent protocol milestones, while OMOP visits represent healthcare encounters. Many ETLs create synthetic visits purely to anchor timing.

3. Diagnoses and Conditions¶

REDCap	OMOP
Diagnosis checkbox / dropdown	CONDITION_OCCURRENCE
Text diagnosis	Needs NLP → concept_id
Onset date	condition_start_date

Best practice:

Force structured diagnosis capture using SNOMED CT concept IDs in REDCap.

4. Medications and Interventions¶

REDCap	OMOP
Medication form	DRUG_EXPOSURE
Free-text meds	RxNorm normalization required
Start / stop dates	Exposure period

5. Labs, Scores, and Quantitative Measures¶

REDCap	OMOP
Numeric lab value	MEASUREMENT
Clinical score (EDSS, PHQ-9)	MEASUREMENTorOBSERVATION
Units	Standardized UCUM

Rule of thumb:

Measured with units → MEASUREMENT
Assessed / rated → OBSERVATION

6. Surveys and PROs¶

REDCap	OMOP
Individual question	OBSERVATION
Composite score	DerivedMEASUREMENT
Survey metadata	Not native to OMOP

This is a known weakness: OMOP preserves answers, not survey structure.

7. Adverse Events¶

REDCap	OMOP
AE form	CONDITION_OCCURRENCE
Severity	observation_source_value
Relatedness	SeparateOBSERVATION

What Does ****¶

Not¶

Map Cleanly¶

REDCap Feature	OMOP Gap
Form structure	Lost during ETL
Field-level provenance	Partially lost
CRF versioning	Not represented
Query resolution	Not represented
Audit trail	External system only

This is why OMOP is ** analysis-ready , not ** regulatory-submission-ready .

Recommended ETL Architecture (Best Practice)¶

REDCap
  ↓ (raw export)
Staging tables (retain form + field context)
  ↓ (semantic mapping)
OMOP CDM

Key design principles:

Preserve raw REDCap exports verbatim
Maintain a mapping table: REDCap field → OMOP concept_id
Version mappings as code
Never hand-map in spreadsheets

Practical Guidance for Your Context¶

Given your interest in:

Longitudinal disease registries
AI-assisted extraction
TRE-based analytics

I would strongly recommend:

OMOP as the analytic layer , not the source of truth
REDCap → OMOP as a one-way, reproducible ETL
Explicit separation of:
Documented data (REDCap)
Normalized clinical facts (OMOP)
Storing:
REDCap project + instrument metadata alongside OMOP in a metadata schema

This aligns with modern OHDSI + AI pipelines and avoids overloading OMOP with responsibilities it was not designed for.

One-Sentence Summary¶

A REDCap study maps to OMOP by transforming each participant record into a PERSON, decomposing forms into domain-specific clinical facts (conditions, drugs, measurements, observations), anchoring time via visits or observation periods, and standardizing all semantics through controlled vocabularies—accepting that form structure, auditability, and study metadata live outside the OMOP CDM.

If you want, I can next:

Walk through a concrete REDCap instrument → OMOP ETL example
Map this to FHIR → OMOP hybrid pipelines
Compare SurveyJS-first vs REDCap-first OMOP strategies
Propose an AI-augmented semantic mapping workflow

Just indicate the next step.