Redcap system structure
Below is a clear, structural mental model for how a REDCap study is organized. This framing is intentionally system-level, not UI-level, and is suitable for onboarding, SOPs, or comparison to custom EDC architectures.
1. The REDCap Project Is the Unit of a Study¶
A REDCap project typically corresponds to one research study (or one protocol).
A project defines:
- The data model (forms/instruments)
- The operational mode (classic vs longitudinal)
- The governance model (users, roles, permissions)
- The participant interaction model (surveys vs staff entry)
Everything in REDCap lives inside a project .
2. Records Represent Study Units (Usually Participants)¶
At the core of a project are records.
A record usually represents:
- A participant
- Occasionally a family, device, specimen, or encounter (less common)
Each record:
- Has a unique record ID
- Accumulates data across all forms, events, and timepoints
- Is the anchor for permissions, locking, and audit logs
3. Instruments (Forms) Define the Data Schema¶
Instruments are structured data collection forms.
Characteristics:
- Each instrument contains fields (variables)
- Instruments can be:
- Staff-entered forms
- Participant-facing surveys
- Both
- Instruments define:
- Data types
- Validation rules
- Branching logic
- Required fields
Conceptually:
Instruments = tables
Fields = columns
Record × Instrument = a form instance
4. Longitudinal Structure: Events and Arms (Optional)¶
For longitudinal or interventional studies, REDCap adds structure:
Events¶
- Represent timepoints or visits (e.g., Baseline, Month 3, Month 6)
Arms¶
- Represent study groups (e.g., Control vs Treatment)
Each record can have:
- Multiple events
- Multiple arms
- The same instrument repeated across events
This creates a matrix:
5. Repeating Instruments and Repeating Events (Optional)¶
For collections like:
- Adverse events
- Medications
- Lab panels
- Hospitalizations
REDCap supports repetition:
- Repeat an instrument within an event
- Repeat an event itself
This approximates one-to-many relationships within an otherwise flat model.
6. Users and Roles Operate on the Project¶
Users¶
- Human users (PI, coordinators, RAs)
- System users (API tokens)
Roles¶
- Permission bundles defining:
- What instruments a user can access
- Whether they can edit, export, or manage users
- Whether they see identifiers
Users are assigned:
- One role per project
- Optionally restricted further by data access groups
7. Sites Are Modeled Using Data Access Groups (DAGs)¶
REDCap does not have “sites” as a native entity.
Instead:
- Data Access Groups (DAGs) partition records
- Each DAG usually represents:
- A site
- A clinic
- A geographic region
Rules:
- Users in a DAG only see records in that DAG
- DAGs do not change the data model—only visibility
Conceptually:
DAGs = row-level security by site
8. Surveys Are Alternate Entry Points, Not Separate Data¶
Surveys:
- Are instruments with a public or authenticated interface
- Write directly into the same record data
- Can be:
- Standalone
- Triggered by logic
- Scheduled longitudinally
Important:
Surveys do not create a second dataset—they populate the same records.
9. Project Lifecycle States¶
Projects move through controlled states:
- Development
- Forms editable
- Data considered test or pilot
- Production
- Structural changes restricted
- Data considered study-valid
- Archived
- Read-only historical record
This enforces data integrity expectations.
10. Cross-Cutting Concerns (Always Present)¶
Across all structures, REDCap maintains:
- Audit logs (who changed what, when)
- Field-level validation
- Record locking and e-signatures
- Export tracking
One-Sentence Mental Model¶
A REDCap study is a project containing records (usually participants), each accumulating data from multiple instruments across events and arms, entered by users with defined roles, optionally partitioned into sites via data access groups, with surveys as alternate entry interfaces—all governed by strict audit and lifecycle controls.
Why This Matters for System Design (Given Your Work)¶
This structure explains why:
- REDCap scales well for moderate complexity trials
- It struggles with:
- Deep relational data
- Cross-study reuse
- Modular schema evolution
- It maps cleanly to:
- CDISC ODM (with constraints)
- FHIR ResearchStudy + ResearchSubject (imperfectly)
Below is a practical, standards-aware mapping between a REDCap study and the OMOP Common Data Model. I’ll be explicit about what maps cleanly , ** what requires transformation , and **where semantics are often lost unless you design for OMOP up front.
Executive Summary¶
REDCap is a ** data-collection system ; OMOP is an ** analytic data model .
Mapping is therefore ETL-driven, not structural. The goal is to translate REDCap records and forms into person-centric, standardized clinical facts with controlled vocabularies.
Core Conceptual Alignment¶
| REDCap Concept | OMOP Concept | Mapping Notes |
|---|---|---|
| Project | Study context | OMOP doesnotstore studies natively; study metadata lives outside the CDM (or in extensions) |
| Record (participant) | PERSON | One REDCap record → one OMOP person |
| Site (DAG) | CARE_SITE/LOCATION | DAG → care_site_id; sometimes organization hierarchy |
| Form / Instrument | Domain tables | Each form decomposes into multiple OMOP domains |
| Field / Variable | Concept + value | Requires vocabulary mapping (SNOMED, LOINC, RxNorm) |
| Event / Visit | VISIT_OCCURRENCE | REDCap events are often protocol-driven, not care-driven |
| Repeating forms | Multiple rows | Natural fit for OMOP’s row-based domains |
| Survey response | OBSERVATION/MEASUREMENT | Depends on whether it is qualitative or quantitative |
Canonical Table-Level Mapping¶
1. Participant Identity¶
| REDCap | OMOP |
|---|---|
| record_id | person_id |
| DOB, sex, race | PERSONfields |
| Enrollment date | OftenOBSERVATION_PERIODstart |
Key design choice:
REDCap record IDs should be surrogate keys, not reused identifiers.
2. Visits, Time, and Longitudinality¶
| REDCap | OMOP |
|---|---|
| Event (Baseline, Month 6) | VISIT_OCCURRENCE |
| Arm | visit_source_valueor custom extension |
| Event date | visit_start_date |
Caution:
REDCap events represent protocol milestones, while OMOP visits represent healthcare encounters. Many ETLs create synthetic visits purely to anchor timing.
3. Diagnoses and Conditions¶
| REDCap | OMOP |
|---|---|
| Diagnosis checkbox / dropdown | CONDITION_OCCURRENCE |
| Text diagnosis | Needs NLP → concept_id |
| Onset date | condition_start_date |
Best practice:
Force structured diagnosis capture using SNOMED CT concept IDs in REDCap.
4. Medications and Interventions¶
| REDCap | OMOP |
|---|---|
| Medication form | DRUG_EXPOSURE |
| Free-text meds | RxNorm normalization required |
| Start / stop dates | Exposure period |
5. Labs, Scores, and Quantitative Measures¶
| REDCap | OMOP |
|---|---|
| Numeric lab value | MEASUREMENT |
| Clinical score (EDSS, PHQ-9) | MEASUREMENTorOBSERVATION |
| Units | Standardized UCUM |
Rule of thumb:
- Measured with units → MEASUREMENT
- Assessed / rated → OBSERVATION
6. Surveys and PROs¶
| REDCap | OMOP |
|---|---|
| Individual question | OBSERVATION |
| Composite score | DerivedMEASUREMENT |
| Survey metadata | Not native to OMOP |
This is a known weakness: OMOP preserves answers, not survey structure.
7. Adverse Events¶
| REDCap | OMOP |
|---|---|
| AE form | CONDITION_OCCURRENCE |
| Severity | observation_source_value |
| Relatedness | SeparateOBSERVATION |
What Does ****¶
Not¶
** Map Cleanly**¶
| REDCap Feature | OMOP Gap |
|---|---|
| Form structure | Lost during ETL |
| Field-level provenance | Partially lost |
| CRF versioning | Not represented |
| Query resolution | Not represented |
| Audit trail | External system only |
This is why OMOP is ** analysis-ready , not ** regulatory-submission-ready .
Recommended ETL Architecture (Best Practice)¶
Key design principles:
- Preserve raw REDCap exports verbatim
- Maintain a mapping table: REDCap field → OMOP concept_id
- Version mappings as code
- Never hand-map in spreadsheets
Practical Guidance for Your Context¶
Given your interest in:
- Longitudinal disease registries
- AI-assisted extraction
- TRE-based analytics
I would strongly recommend:
- OMOP as the analytic layer , not the source of truth
- REDCap → OMOP as a one-way, reproducible ETL
- Explicit separation of:
- Documented data (REDCap)
- Normalized clinical facts (OMOP)
- Storing:
- REDCap project + instrument metadata alongside OMOP in a metadata schema
This aligns with modern OHDSI + AI pipelines and avoids overloading OMOP with responsibilities it was not designed for.
One-Sentence Summary¶
A REDCap study maps to OMOP by transforming each participant record into a PERSON, decomposing forms into domain-specific clinical facts (conditions, drugs, measurements, observations), anchoring time via visits or observation periods, and standardizing all semantics through controlled vocabularies—accepting that form structure, auditability, and study metadata live outside the OMOP CDM.
If you want, I can next:
- Walk through a concrete REDCap instrument → OMOP ETL example
- Map this to FHIR → OMOP hybrid pipelines
- Compare SurveyJS-first vs REDCap-first OMOP strategies
- Propose an AI-augmented semantic mapping workflow
Just indicate the next step.