Skip to content

Redcap system structure

Below is a clear, structural mental model for how a REDCap study is organized. This framing is intentionally system-level, not UI-level, and is suitable for onboarding, SOPs, or comparison to custom EDC architectures.


1. The REDCap Project Is the Unit of a Study

A REDCap project typically corresponds to one research study (or one protocol).

A project defines:

  • The data model (forms/instruments)
  • The operational mode (classic vs longitudinal)
  • The governance model (users, roles, permissions)
  • The participant interaction model (surveys vs staff entry)

Everything in REDCap lives inside a project .


2. Records Represent Study Units (Usually Participants)

At the core of a project are records.

A record usually represents:

  • A participant
  • Occasionally a family, device, specimen, or encounter (less common)

Each record:

  • Has a unique record ID
  • Accumulates data across all forms, events, and timepoints
  • Is the anchor for permissions, locking, and audit logs

3. Instruments (Forms) Define the Data Schema

Instruments are structured data collection forms.

Characteristics:

  • Each instrument contains fields (variables)
  • Instruments can be:
  • Staff-entered forms
  • Participant-facing surveys
  • Both
  • Instruments define:
  • Data types
  • Validation rules
  • Branching logic
  • Required fields

Conceptually:

Instruments = tables

Fields = columns

Record × Instrument = a form instance


4. Longitudinal Structure: Events and Arms (Optional)

For longitudinal or interventional studies, REDCap adds structure:

Events

  • Represent timepoints or visits (e.g., Baseline, Month 3, Month 6)

Arms

  • Represent study groups (e.g., Control vs Treatment)

Each record can have:

  • Multiple events
  • Multiple arms
  • The same instrument repeated across events

This creates a matrix:

Record × Arm × Event × Instrument

5. Repeating Instruments and Repeating Events (Optional)

For collections like:

  • Adverse events
  • Medications
  • Lab panels
  • Hospitalizations

REDCap supports repetition:

  • Repeat an instrument within an event
  • Repeat an event itself

This approximates one-to-many relationships within an otherwise flat model.


6. Users and Roles Operate on the Project

Users

  • Human users (PI, coordinators, RAs)
  • System users (API tokens)

Roles

  • Permission bundles defining:
  • What instruments a user can access
  • Whether they can edit, export, or manage users
  • Whether they see identifiers

Users are assigned:

  • One role per project
  • Optionally restricted further by data access groups

7. Sites Are Modeled Using Data Access Groups (DAGs)

REDCap does not have “sites” as a native entity.

Instead:

  • Data Access Groups (DAGs) partition records
  • Each DAG usually represents:
  • A site
  • A clinic
  • A geographic region

Rules:

  • Users in a DAG only see records in that DAG
  • DAGs do not change the data model—only visibility

Conceptually:

DAGs = row-level security by site


8. Surveys Are Alternate Entry Points, Not Separate Data

Surveys:

  • Are instruments with a public or authenticated interface
  • Write directly into the same record data
  • Can be:
  • Standalone
  • Triggered by logic
  • Scheduled longitudinally

Important:

Surveys do not create a second dataset—they populate the same records.


9. Project Lifecycle States

Projects move through controlled states:

  1. Development
  2. Forms editable
  3. Data considered test or pilot
  4. Production
  5. Structural changes restricted
  6. Data considered study-valid
  7. Archived
  8. Read-only historical record

This enforces data integrity expectations.


10. Cross-Cutting Concerns (Always Present)

Across all structures, REDCap maintains:

  • Audit logs (who changed what, when)
  • Field-level validation
  • Record locking and e-signatures
  • Export tracking

One-Sentence Mental Model

A REDCap study is a project containing records (usually participants), each accumulating data from multiple instruments across events and arms, entered by users with defined roles, optionally partitioned into sites via data access groups, with surveys as alternate entry interfaces—all governed by strict audit and lifecycle controls.


Why This Matters for System Design (Given Your Work)

This structure explains why:

  • REDCap scales well for moderate complexity trials
  • It struggles with:
  • Deep relational data
  • Cross-study reuse
  • Modular schema evolution
  • It maps cleanly to:
  • CDISC ODM (with constraints)
  • FHIR ResearchStudy + ResearchSubject (imperfectly)

Below is a practical, standards-aware mapping between a REDCap study and the OMOP Common Data Model. I’ll be explicit about what maps cleanly , ** what requires transformation , and **where semantics are often lost unless you design for OMOP up front.


Executive Summary

REDCap is a ** data-collection system ; OMOP is an ** analytic data model .

Mapping is therefore ETL-driven, not structural. The goal is to translate REDCap records and forms into person-centric, standardized clinical facts with controlled vocabularies.


Core Conceptual Alignment

REDCap Concept OMOP Concept Mapping Notes
Project Study context OMOP doesnotstore studies natively; study metadata lives outside the CDM (or in extensions)
Record (participant) PERSON One REDCap record → one OMOP person
Site (DAG) CARE_SITE/LOCATION DAG → care_site_id; sometimes organization hierarchy
Form / Instrument Domain tables Each form decomposes into multiple OMOP domains
Field / Variable Concept + value Requires vocabulary mapping (SNOMED, LOINC, RxNorm)
Event / Visit VISIT_OCCURRENCE REDCap events are often protocol-driven, not care-driven
Repeating forms Multiple rows Natural fit for OMOP’s row-based domains
Survey response OBSERVATION/MEASUREMENT Depends on whether it is qualitative or quantitative

Canonical Table-Level Mapping

1. Participant Identity

REDCap OMOP
record_id person_id
DOB, sex, race PERSONfields
Enrollment date OftenOBSERVATION_PERIODstart

Key design choice:

REDCap record IDs should be surrogate keys, not reused identifiers.


2. Visits, Time, and Longitudinality

REDCap OMOP
Event (Baseline, Month 6) VISIT_OCCURRENCE
Arm visit_source_valueor custom extension
Event date visit_start_date

Caution:

REDCap events represent protocol milestones, while OMOP visits represent healthcare encounters. Many ETLs create synthetic visits purely to anchor timing.


3. Diagnoses and Conditions

REDCap OMOP
Diagnosis checkbox / dropdown CONDITION_OCCURRENCE
Text diagnosis Needs NLP → concept_id
Onset date condition_start_date

Best practice:

Force structured diagnosis capture using SNOMED CT concept IDs in REDCap.


4. Medications and Interventions

REDCap OMOP
Medication form DRUG_EXPOSURE
Free-text meds RxNorm normalization required
Start / stop dates Exposure period

5. Labs, Scores, and Quantitative Measures

REDCap OMOP
Numeric lab value MEASUREMENT
Clinical score (EDSS, PHQ-9) MEASUREMENTorOBSERVATION
Units Standardized UCUM

Rule of thumb:

  • Measured with units → MEASUREMENT
  • Assessed / rated → OBSERVATION

6. Surveys and PROs

REDCap OMOP
Individual question OBSERVATION
Composite score DerivedMEASUREMENT
Survey metadata Not native to OMOP

This is a known weakness: OMOP preserves answers, not survey structure.


7. Adverse Events

REDCap OMOP
AE form CONDITION_OCCURRENCE
Severity observation_source_value
Relatedness SeparateOBSERVATION

What Does ****

Not

** Map Cleanly**

REDCap Feature OMOP Gap
Form structure Lost during ETL
Field-level provenance Partially lost
CRF versioning Not represented
Query resolution Not represented
Audit trail External system only

This is why OMOP is ** analysis-ready , not ** regulatory-submission-ready .


REDCap
  ↓ (raw export)
Staging tables (retain form + field context)
  ↓ (semantic mapping)
OMOP CDM

Key design principles:

  • Preserve raw REDCap exports verbatim
  • Maintain a mapping table: REDCap field → OMOP concept_id
  • Version mappings as code
  • Never hand-map in spreadsheets

Practical Guidance for Your Context

Given your interest in:

  • Longitudinal disease registries
  • AI-assisted extraction
  • TRE-based analytics

I would strongly recommend:

  1. OMOP as the analytic layer , not the source of truth
  2. REDCap → OMOP as a one-way, reproducible ETL
  3. Explicit separation of:
  4. Documented data (REDCap)
  5. Normalized clinical facts (OMOP)
  6. Storing:
  7. REDCap project + instrument metadata alongside OMOP in a metadata schema

This aligns with modern OHDSI + AI pipelines and avoids overloading OMOP with responsibilities it was not designed for.


One-Sentence Summary

A REDCap study maps to OMOP by transforming each participant record into a PERSON, decomposing forms into domain-specific clinical facts (conditions, drugs, measurements, observations), anchoring time via visits or observation periods, and standardizing all semantics through controlled vocabularies—accepting that form structure, auditability, and study metadata live outside the OMOP CDM.

If you want, I can next:

  • Walk through a concrete REDCap instrument → OMOP ETL example
  • Map this to FHIR → OMOP hybrid pipelines
  • Compare SurveyJS-first vs REDCap-first OMOP strategies
  • Propose an AI-augmented semantic mapping workflow

Just indicate the next step.