Implementation Plan: EDC System Prototype (FastAPI + PostgreSQL)¶
Objective: Develop a prototype Electronic Data Capture (EDC) system for clinical trials using a modern tech stack (FastAPI for backend, PostgreSQL for database). The system will emulate core REDCap functionality with improved architecture, and integrate SurveyJS for form design and survey delivery. Key features include form design (SurveyJS Creator), version-controlled forms with audit trails, role-based access control (RBAC) for various user roles, participant-facing surveys (SurveyJS Runner), eConsent and randomization modules (as placeholders), study management (arms, visits, schedules), participant visit workflows (with mobile-first design and reminders), video conferencing integration for remote visits, and secure data export APIs. Compliance considerations (audit logging, permissions, data retention, etc.) are woven into each phase to ensure the prototype aligns with regulatory best practices (e.g. FDA 21 CFR Part 11, ICH-GCP). The plan is structured in phases, each describing core components, data models, API endpoints, and integration steps. This phased approach will guide a development team through incremental implementation, yielding a maintainable and extensible architecture.
Phase 1: Core Architecture and User Management¶
Goals: Establish the fundamental backend structure with FastAPI and PostgreSQL, implement user authentication and RBAC, and set up compliance foundations (audit trail, data security). This phase creates the backbone on which all other features will build.
- Project Setup & Tech Stack: Initialize a FastAPI project with a modular structure (e.g., separate Python modules for routes, models, services). Use SQLAlchemy (with Alembic for migrations) as the ORM to interact with PostgreSQL . Configure FastAPI with Uvicorn for ASGI serving . Ensure the project is set up in a virtual environment with required dependencies (FastAPI, Uvicorn, SQLAlchemy, Pydantic, etc.). Consider structuring the code with clear separation of concerns: e.g., auth.py for auth routes, models.py for ORM models, schemas.py for Pydantic models (request/response validation), and so on . This will keep the architecture clean and maintainable.
- Data Models – Users & Roles: Define a User model (SQLAlchemy) with fields: id, username (or email), password_hash, and role (plus timestamps created_at, last_login, etc.). The role can be an enum or separate table; for simplicity, use a string or enum field with allowed values: admin, study_designer, site_staff, monitor, participant. If multi-tenancy across studies is needed, also include relationships to studies or sites (e.g., a join table assigning users to studies/sites with roles). The Role concept can be implemented via a field or via a related table if more granular permissions are needed in future. Ensure passwords are stored hashed (using a strong algorithm like bcrypt).
-
RBAC Implementation: Enforce role-based access in the FastAPI endpoints via dependency injections. For example, create dependencies like get_current_user() (to authenticate via OAuth2 password flow or JWT tokens) and require_role("admin") to authorize specific roles. This way, each protected endpoint will check that the user is logged in and has an appropriate role. As a simple approach, include the role check in the dependency and raise HTTPException(status_code=403) if not authorized. Document the intended permissions for each role:
-
Admin: Full system access – manage users, studies, and view all data. Can assign roles and view audit logs .
- Study Designer: Can create and configure studies (arms, visits), design forms, and manage study settings. No direct access to participant personal data unless also assigned site roles.
- Site Staff: Can enroll participants, view and enter data for participants at their site, and mark visit completions. Cannot modify study design.
- Monitor: Read-only access to study data for monitoring purposes – can view forms/responses (possibly issue queries on data) . No editing rights.
- Participant: Can view their own schedule and forms, submit survey responses, and view their eConsent. No access to other participants or study management.
The RBAC rules should be enforced at the API level to prevent users from accessing operations outside their role . (For example, monitors cannot call data edit APIs, participants cannot see admin endpoints, etc.) * Auth Mechanism: Implement authentication using FastAPI’s OAuth2 with “password” flow and JWT tokens, or session cookies if preferred. The user login endpoint (POST /auth/login) should verify credentials (using hashed password verification) and return a JWT or session token. A user registration endpoint (POST /auth/register) might be admin-only (since typically users are invited/created by an admin in a clinical system). Also include an endpoint to fetch the current user’s profile/role (GET /auth/me) to support the front-end in showing appropriate UI. Multi-factor authentication (MFA) is not implemented at this phase, but the design should allow adding it later (e.g., by integrating an OTP or authenticator app workflow in the auth process in a future phase). * Audit Trail Foundation: From day one, incorporate an audit logging mechanism to track important events (to meet compliance needs). Create an AuditLog model/table with fields: id, timestamp, user_id (who performed an action, nullable if system-generated), action (e.g., “USER_LOGIN”, “USER_CREATE”, “FORM_EDIT”, etc.), details (JSON or text describing the event, old vs new values if applicable). Whenever a critical action occurs (user login, data creation or modification, permission change, etc.), record an entry. This can be done via a helper function (e.g., log_action(user, action, details)) called in the relevant route logic. Ensure audit logs are write-only (append) – no deletion or modification, to be compliant with regulations requiring tamper-evident logs . The audit records should include enough information to reconstruct events: who, when, and what changed. For example, if a user updates a form response later, the audit log would capture the old value, new value, and reason if provided. At this stage, capturing the events is key; enforcement of “reason for change” will come in later phases. * Security & Data Protection: Configure all communications over HTTPS (in deployment) to protect data in transit. Use proper CORS settings in FastAPI if the SurveyJS front-end will be on a different domain or served separately. Ensure the database credentials and JWT secrets are kept secure (e.g., via environment variables). Consider enabling PostgreSQL encryption at rest if required (PostgreSQL can use TLS for data in transit to disk, or rely on disk encryption). While full encryption at rest might not be in prototype, note it as a future consideration for PHI. At a minimum, ensure sensitive fields (like participant identifiers) are not exposed through unsecured channels. Data retention: Implement soft-deletion flags for records instead of hard deletes (e.g., a boolean is_archived for participants or forms), so data is never truly lost – this supports the requirement of retaining data and audit trails. (Regulatory guidance often requires retaining clinical data for years; even in prototype, design with archival in mind.) * Initial API Endpoints (Phase 1):
- POST /auth/login – authenticate user, return token.
- POST /auth/logout – (if using session cookies, to clear session; if JWT, client can simply discard).
- GET /auth/me – returns current user info and role.
- POST /users – (admin only) create a new user (assign role, etc.).
- GET /users – (admin only) list users, or perhaps filter by study.
- PUT /users/{id} – (admin) update user role or other info, e.g., deactivate user.
- (If multi-center: GET /sites & POST /sites might also be allowed for admin to manage study sites – optional if multi-center is considered early).
Each of these endpoints should integrate with the audit log (e.g., log user creations, role changes, log login attempts). Also, ensure to validate and sanitize input for these endpoints to prevent injection attacks or invalid data .
Compliance Check (Phase 1): By the end of this phase, the system should have secure user authentication, role-based authorization scaffolding, and an audit trail capturing user management events. This lays the foundation for 21 CFR Part 11 compliance: unique user accounts, controlled access (only authorized roles can perform certain actions) , and computer-generated audit records for key events . Although the prototype may not yet be validated, it’s crucial to build these compliance principles into the architecture from the start. For example, ensure the audit trail is time-stamped and immutable (each entry with a timestamp and no deletion) . We will continue to log changes in subsequent phases (forms, data entries, etc.) so that by the end, “the audit trail automatically logs all changes with a timestamp” in line with GCP and FDA requirements .
Phase 2: Form Design Module with SurveyJS Creator¶
Goals: Integrate SurveyJS Creator to allow study designers to build custom eCRF forms (questionnaires) via a friendly UI, and store these form definitions in the database with version control and auditing. This will replicate REDCap’s ability to design instruments, but using a modern JSON-based form schema.
- SurveyJS Creator Integration: SurveyJS Creator is a JavaScript component that provides a drag-and-drop form builder on the frontend . In this phase, we set up an Admin/Designer UI (this could be a simple single-page application or an admin section in a web app) where the study designer can create and edit forms using SurveyJS Creator. The Creator produces a JSON schema representing the form structure (questions, choices, validation, skip logic, etc.). We will embed this component in the study designer’s interface and configure it to communicate with our FastAPI backend via REST API. Specifically, when a designer clicks “Save” in the form builder, it will send the JSON schema to our backend API (for example, via a POST /studies/{id}/forms endpoint).
-
Data Model – Form Definitions: Create a FormDefinition table in PostgreSQL to store the form schemas. Following SurveyJS’s recommended approach, include at least:
-
id (primary key),
- study_id (foreign key to the Study this form belongs to, anticipating Phase 4),
- name (form name or title),
- schema (JSON column storing the SurveyJS form JSON),
- version (integer or semantic version string),
- created_by, created_at; updated_by, updated_at (for audit purposes).
The JSON schema can be stored as text or JSONB (PostgreSQL JSONB is ideal for querying if needed). We establish that one FormDefinition can have many Form versions. There are two strategies for versioning:
- Simple version field: Keep old versions in the same table with a version number and perhaps a flag is_active. When a form is updated, insert a new row with a new version number (linking to the original form via a field like form_code or parent_form_id). Mark the previous version as archived but keep it for reference.
- Separate version table: Have a Form table (for form metadata like name) and a FormVersion table for each version’s schema. For simplicity, approach (1) is fine in the prototype – e.g., store each form version as a row and use form_code to identify forms across versions, or simply treat each row as distinct and rely on naming and study context.
Either way, maintain an audit log of changes: when a form is created or updated (new version), log an entry (who made the change, old version vs new version reference) in the AuditLog. This ensures we can trace form edits (important for GCP compliance on CRF changes). * API Endpoints – Form Management: Develop endpoints for form CRUD within a study context:
- POST /studies/{study_id}/forms: Accepts a JSON schema (from SurveyJS Creator) and metadata (form name, etc.) to create a new form definition. The backend stores the schema and returns the new form ID/version. Only users with study_designer role (or admin) for that study can access this.
- GET /studies/{study_id}/forms: List existing forms for that study (could include only latest versions or all versions depending on query params).
- GET /studies/{study_id}/forms/{form_id}: Fetch a specific form schema (possibly the latest version if multiple). This is used if a designer wants to edit an existing form – the SurveyJS Creator can load the JSON into the builder.
- PUT /studies/{study_id}/forms/{form_id} or POST /studies/{study_id}/forms/{form_id}/versions: Update a form. In practice, to maintain version control, we might not overwrite the existing row. Instead, this call can create a new version (increment version number) with the updated schema. The API should handle version increment logic. It can return the new version’s ID.
- Possibly DELETE /studies/{study_id}/forms/{form_id}: Mark a form as deleted/inactive (but do not actually remove from DB – maybe set an archived flag). Only allowed if no data collected for that form yet, or in prototype just allow soft-delete for mistakes (with audit log entry).
Ensure these endpoints check that the user is a study_designer (or admin) for the given study. Also, tie the form to the correct study to avoid cross-study data leakage. * Integration Guidance: On the frontend, the SurveyJS Creator can be configured with a save callback. For example, use the Creator’s save survey function to call our API. SurveyJS provides examples of integrating with a backend by overriding the default save mechanism: we will implement that such that on save, an HTTP POST is made to our FastAPI endpoint with the JSON content. Conversely, to edit, we call the GET to fetch the existing schema and load it into the Creator. Because SurveyJS works with pure JSON, our backend doesn’t need to know form question specifics – just store and retrieve the JSON. This decoupling keeps the system flexible for future changes (new question types, etc. are handled by SurveyJS’s JSON without backend changes).
Example SurveyJS client-server interaction. SurveyJS libraries (Creator, Runner, etc.) communicate with the backend via JSON: sending survey schema definitions to be saved, and retrieving them for display; and submitting survey results for storage . Our FastAPI backend will expose REST endpoints to handle these JSON payloads, storing schemas and results in PostgreSQL. The backend ensures proper authentication (only authorized roles can save or fetch certain surveys) and maintains a one-to-many relationship between form schemas and responses .
- Validation & Constraints: When saving forms, consider basic validation on the backend. For instance, ensure the JSON is not empty and perhaps limit size (to avoid extremely large forms causing issues). The content of the schema is largely freeform (as generated by SurveyJS), but it might contain scriptable logic (SurveyJS supports expressions and logic). Ensure that this does not pose security issues (SurveyJS JSON could theoretically include custom scripts if using extensions – we should disallow any executable scripts in questions for security). SurveyJS Creator likely sanitizes input itself, but it’s worth reviewing. Also, ensure the form name is unique within a study (to avoid confusion).
- Audit Logging: Log form creation and updates. For example: “Form ‘Eligibility Survey’ v1 created by UserX in StudyY” (with timestamp), “Form ‘Eligibility Survey’ updated to v2 by UserY – changes: added question X, etc.”. We can store the diff in a human-readable way in the audit details or simply note that a new version was created. This provides an audit trail of CRF design changes (which is important for tracking how data collection instruments change over time during a study).
- Version Control Approach: Highlight that each saved form is version-controlled. In practice, once data collection has started with a form, editing it should create a new version rather than modifying the live form, to avoid invalidating existing data. Our prototype will implement this by treating each “update” as creating a new version. We’ll need to decide how to handle existing participants who have an older version: typically, those who completed an old version keep their data, and new data uses the new version. For now, we can mark older version as inactive for new entries. This nuance can be noted for future extension (maybe allow multiple versions in parallel only if needed).
Compliance Check (Phase 2): Storing form definitions with version and creator info addresses the documentation of data definitions (crucial for traceability in clinical research). The audit trail ensures any change to the data capture instrument is logged with time and user , aiding compliance. Role permissions are enforced so that only authorized staff (study designers) can modify forms , reflecting separation of duties. Though electronic form design doesn’t directly involve subject data, maintaining versions and an audit trail of changes fulfills part of 21 CFR 11.10(e) (change control with audit trail) and GCP recommendations to document all changes to CRFs. The system at this stage is also poised to implement data standards later (for example, adding fields to map form questions to CDISC standards in the future, as OpenEDC does with ODM exports – not in prototype scope, but architecture is ready to extend).
Phase 3: Survey Delivery and Data Capture (Participant-Facing)¶
Goals: Develop the participant-facing survey functionality. This includes using SurveyJS Runner (Form Library) to render forms to participants or study staff for data entry, capturing responses, and storing those responses in the database. We will also introduce the ability to deliver surveys to participants (e.g., via unique links or authenticated participant portal), and ensure all data capture is logged (with potential for eConsent integration).
- SurveyJS Runner Integration: SurveyJS provides a Form Library component that takes a survey JSON schema and renders an interactive form for the user to fill, then produces a JSON result object with the answers . We will integrate this in the front-end (for participants or staff) to display the forms designed in Phase 2. For a web application, this could be an embedded SurveyJS component on a page that fetches the form schema from the backend via an API call (e.g., GET /studies/{study_id}/forms/{form_id}) and then SurveyJS handles the UI. If building a mobile app (Capacitor) or a responsive web app, ensure the SurveyJS form is mobile-friendly (SurveyJS forms are responsive by default, but custom CSS might be needed for branding).
-
Data Model – Survey Responses: Create a Response (survey result) table to store answers. Based on SurveyJS’s guidance, we can keep it simple with two tables: one for schemas (already FormDefinition) and one for results . Define FormResponse with fields:
-
id (PK),
- form_id (FK to FormDefinition, ideally including the version or linking specifically to the version used),
- participant_id (FK to Participant – from Phase 5, but we can create Participant model stub here, or allow null if staff-entered data for now),
- response_data (JSON of the answers submitted),
- submitted_by (who filled it out: could be the participant’s user id or a staff user id if entered on behalf of participant),
- submitted_at (timestamp).
- Optionally, a status (e.g., “complete”, “incomplete”, “flagged”) or is_draft if we allow saving incomplete forms – not necessary for initial prototype, but consider for future.
The relationship is one FormDefinition to many FormResponses . We will likely tie FormResponse to a specific form version to be precise. One approach: when a form is updated to a new version, we may keep the same form_id but bump version field; old responses remain linked to that form id but we can infer version by date or if we stored version in the response. Simpler: treat each version as distinct form record (with its own ID in DB) – then form_id in responses inherently refers to the correct version schema it used. This avoids ambiguity and is easier when exporting data. * API Endpoints – Survey Taking: Provide endpoints for participants or site staff to retrieve surveys and submit responses:
- GET /studies/{study_id}/forms/{form_id}/take – Returns the form schema JSON for rendering to the user. (This could be the same as the GET used in Phase 2 but with restricted info. Alternatively, if we want to hide certain design metadata from participants, we might use a slightly different output. SurveyJS schema is fine to give directly, as it only contains questions and not design secrets.)
- POST /studies/{study_id}/forms/{form_id}/responses: Accepts a JSON payload of answers (the SurveyJS “survey result”). The backend will attach meta info (participant ID from auth, etc.) and save to FormResponse. On success, return 201 Created.
- Optionally, GET /studies/{study_id}/forms/{form_id}/responses/{resp_id} for retrieving a previously submitted response (for monitors or for allowing editing if permitted).
- Optionally, PUT /studies/{study_id}/forms/{form_id}/responses/{resp_id} if allowing updates/corrections to a submitted form (with proper audit logging and maybe requiring a reason for change).
These endpoints must ensure the user calling them has rights: a participant can only submit their own data (and likely only for forms assigned to them or open to them), while site staff might also use these to enter data for participants at their site (e.g., in a traditional EDC, staff fill out CRFs during clinic visits). We may differentiate by who is submitted_by. A strategy: if a site staff user calls the API with a participant_id in the payload, allow it if they have access to that participant and form (for now assume same study and site). * Survey Links and Access: We need a way for participants to access their surveys. Two modes:
- Authenticated Portal: Participants log into a portal (or mobile app) and see their pending surveys. This requires Phase 5 (participant accounts). In this phase, if participant accounts exist, secure the GET form and POST response endpoints with participant auth.
- Direct Survey Links: Alternatively (and commonly in REDCap ePRO), each survey instance can be accessed via a unique link (a token in URL) without full login. For the prototype, we might not implement the token system fully, but keep it in mind. Possibly generate a UUID link for each scheduled survey and send that to participant via email. The link would map to a temporary auth token that allows access to just that one form. This approach can be an extension later; for now, we assume either the participant is logged in or a simple scenario where study team opens the survey on a tablet for the participant.
-
eConsent Integration (Basic): We plan for an eConsent module, which is essentially a specialized survey form that captures informed consent. At this phase, we can implement a placeholder: treat Consent Form as a type of FormDefinition (with special fields like signature). SurveyJS has a signature pad question type or we can integrate one (SurveyJS supports custom widgets, and there is likely one for signature capture, or we could use a separate approach). For now:
-
Create a template for an eConsent form (with fields for participant name, a consent confirmation question, and a signature field).
- Ensure the signature can be captured as an image or vector (SurveyJS might return a data URL for the drawn signature).
- When a participant submits the consent form, store it in FormResponse as usual (with the signature in JSON possibly as encoded image).
- Then mark the participant as consented (perhaps update a field in Participant record like consent_given=True and consent_date).
- Placeholder: We might not generate the PDF in this phase, but note that in a real system we would generate a PDF of the consent form content with the signature and store it (OpenEDC and REDCap auto-archive the signed consent PDF ). We can plan a later sub-phase to generate PDF using a library (SurveyJS has a PDF generator component that could potentially be used ).
- Also, in real scenarios, staff often countersign the consent. For the prototype, we may skip countersignature, but design-wise we can allow a field for a staff user to later attest (or a separate form).
For now, treat eConsent as just a special form that the participant must complete at the start. We might create a specific endpoint or workflow: e.g., POST /studies/{study_id}/participants/{pid}/consent which triggers creation of a consent form link or marks them as consented after form submission. * Frontend (Participant UI): If participants log in, provide a simple dashboard listing their available surveys (visits) – e.g., “You have 2 tasks: Consent Form, Baseline Questionnaire due today”. Clicking one would fetch the form and present via SurveyJS Runner. If using Capacitor to wrap this, ensure the frontend is responsive and works on mobile devices (SurveyJS is largely client-side; our backend just serves data). * Mobile/Offline Consideration: Capacitor allows the app to run on mobile; we can leverage device features later (not in this phase). If offline data capture is needed in future (for remote areas), note that SurveyJS forms can be preloaded and responses queued offline . This is advanced; not needed in prototype but our architecture (JSON in/out) can support it since the client could store JSON results and sync when online. * Audit Logging: Every survey submission should be logged. Specifically, when a FormResponse is created, log which user (or participant) submitted data for which form and when. If any changes happen (e.g., data corrections via PUT), log old vs new values, who changed it, when, and require a reason for change if possible. In a prototype, we can optionally enforce reason on data change by including a field in the API request or simply allow editing and count on the user to document reason manually. However, compliance requires capturing reason for changes to data post initial entry . We can incorporate a simple mechanism: if a response exists and is being edited, require a ?reason= query param or field in body, then log that reason in AuditLog. * Data Quality & Validation: SurveyJS itself can enforce data validation rules (like required fields, ranges, etc.) defined in the form schema. Ensure those are utilized (the form designer can set validation in Creator, which then applies in Runner). The backend should still perform basic validation on submitted data – e.g., that required questions per schema are present (SurveyJS likely ensures this before allowing completion, but a malicious user could bypass). To validate, we could re-validate the JSON against the schema or at least ensure required keys exist. This can be complex, so for prototype, we trust SurveyJS validation and just do sanity checks (like the response JSON is not empty and has the expected form_id).
Compliance Check (Phase 3): At this point, the system can capture actual clinical data. Compliance considerations:
- Audit Trail: We are logging data entries and any modifications, fulfilling the requirement of an audit trail for electronic records (who entered what and when) . Ensure the audit log cannot be altered by end-users and is secure (only admin/monitors can view audit logs).
- User Authentication for Data Entry: In Part 11, each data entry must be attributable to a user. Here, if participants are entering data, they have their own accounts (or a unique link identifies them); if site staff enter, they use their account. This satisfies the identity verification requirement (each record can be tied to a unique user) .
- Electronic Signature (placeholder): If any form requires an e-signature (like eConsent or a PI’s sign-off), the system should require the user’s credentials again at submission. For example, to sign the eConsent, the participant might need to re-enter password or a PIN to affirm their identity at the time of signing (this is what OpenEDC does: “electronic signatures require re-entering the password” ). In the prototype, we note this but may not implement the re-auth on form submit due to complexity. We could simulate it by requiring the participant to log in to access the form (which they do) and trusting that as the signature. A future extension is to add a confirm password step for signing forms.
- Data Integrity: Once submitted, form data should be read-only (except via an official correction process). The prototype may allow editing for convenience, but ideally, design the API such that editing a response is a separate action (not just re-posting to the same endpoint without trace). This could be done via a distinct endpoint that logs the change and perhaps marks the response as “amended”. At minimum, our audit log will record if a change occurred.
- Privacy: Ensure that participants can only access their own responses. The API should enforce that (e.g., a participant token cannot fetch someone else’s form or responses). Also, when storing personally identifiable information (PII) like names in eConsent, consider encryption. OpenEDC, for instance, supports end-to-end encryption for PII so that even the server cannot read it without keys . We might not implement that in the prototype, but mention it as a design consideration for sensitive fields (optionally use PostgreSQL encryption functions or store such data separately with encryption).
By the end of Phase 3, we have a basic end-to-end flow: a form can be designed, delivered to a participant, and the data captured is stored securely with an audit trail. This covers core EDC data capture functionality akin to REDCap’s survey feature.
Phase 4: Study Management – Arms, Visits, and Randomization¶
Goals: Introduce the concept of a Study with its design: multiple arms, scheduled visits/timepoints, and mapping of forms to those visits. Implement randomization logic (basic for now) to assign participants to arms. Essentially, this phase adds the layer that organizes forms into a protocol schedule, similar to how REDCap projects can have events and arms for longitudinal studies.
-
Data Models – Studies, Arms, Visits:
-
Study: Create a Study model/table with fields: id, name, description, possibly start_date, end_date (for overall study timeline), and maybe a protocol_id or code. Could also include metadata like the PI or sponsor, but not necessary for prototype. The Study will tie together arms, forms, etc. The Study could also have a created_by and timestamps.
- Arm: Arm (or StudyArm) model with fields: id, study_id (FK to Study), name (e.g., “Treatment”, “Placebo”), description, target_enrollment (optional). If randomization is implemented, include fields like allocation_ratio or sequence info (some studies might have equal randomization, some not – for prototype, assume equal unless specified otherwise).
- Visit/Event: Visit model representing a scheduled event in the study. Fields: id, study_id (or possibly arm-specific if different arms have different visit schedules; but often arms share the same schedule except some forms might differ). Possibly include arm_id if certain visits only apply to an arm, or have a many-to-many if complex. Simpler: define a set of visits for the study that apply to all arms, but allow attaching forms conditionally per arm if needed.
- Fields for Visit: name (e.g., “Baseline”, “Month 1 Follow-up”), day_offset or window (when it occurs relative to enrollment – could be number of days/week), or scheduled_date if a static schedule, but relative is more flexible for all participants. We might have visit_index or sequence number.
- Could include optional flag or other attributes (not needed now).
- VisitForm mapping: We need to map which FormDefinitions are to be completed at each Visit (and possibly per Arm). We can create a join table or model VisitFormAssignment with fields: id, visit_id, form_id (which version?), and maybe arm_id if a form is only for a certain arm/stratum. Alternatively, if it’s simpler, each Arm has its own sequence of visits (like Arm A has Baseline, Month1, Month2, etc, Arm B the same but maybe plus an extra visit). For now, let’s assume uniform visits across arms; we can ignore the arm dimension in the mapping and just attach forms to visits for all arms, unless a form is truly arm-specific (rare in structure).
With these, we can describe a study’s design: e.g., Study “XYZ” has 2 arms (Drug vs Placebo), and visits “Screening”, “Baseline (Day 0)”, “Month 1”, “Month 2”, etc., each with certain forms (e.g., Baseline might have Demographics form, Month 1 has Safety form, etc.). * Randomization Logic: Implement a basic randomization service to assign participants to an arm. The simplest approach is permuted block randomization for balance, or even simple random for prototype:
- We can configure a block size (say block of 4 or 6) and ensure equal allocation within each block. If stratification is needed (e.g., stratify by site or by some participant attribute), incorporate that by having separate randomization lists per stratum (e.g., one block sequence per site).
- However, given the complexity, for now we can do: when a participant is enrolled (see Phase 5 for enrollment), if the study has multiple arms, the system randomly assigns an arm with equal probability. We store the assigned arm in the participant record.
- To enhance: maintain a counter of how many assigned to each arm; optionally enforce balance by simple algorithm (if difference >1, assign to the lesser filled arm).
- We will design the code such that in the future, a more sophisticated randomization module can replace it (for example, using a library or external IWRS). The architecture might include a Randomization model for future (with method type, block size, etc.), but not strictly needed for initial placeholder.
Provide an endpoint for randomization if needed, or just integrate into participant creation:
- POST /studies/{study_id}/participants (which we will do in Phase 5) can trigger randomization automatically.
- Or have a dedicated call POST /studies/{study_id}/randomize that returns an assignment (but in practice, that needs a participant context). Likely better to handle on participant creation.
- For testing, perhaps a GET /studies/{study_id}/arms/{arm_id}/nextCode if assignment involved allocation codes or kits (beyond scope now).
Document that the randomization in prototype is rudimentary and should be replaced or validated against the protocol’s randomization plan in a real system. * API Endpoints – Study Design Management: Enable study designers or admins to define the study structure:
- POST /studies – create a new study (admin or study_designer role global). Provide study name, etc.
- GET /studies – list studies the user has access to (admin sees all; study_designer sees theirs; site staff sees those with their site; participants see none or maybe their study).
- GET /studies/{id} – view study details (maybe including arms and visits summary).
- PUT /studies/{id} – update study info (or to finalize a protocol).
- POST /studies/{id}/arms – add an arm (with name, etc.).
- GET /studies/{id}/arms – list arms.
- POST /studies/{id}/visits – add a visit (with name, schedule offset).
- GET /studies/{id}/visits – list visits (optionally filter by arm if we decide to separate).
- POST /studies/{id}/visit-schedule (or some better-named endpoint to attach forms to visits): The payload could be a mapping like {visit_id: [form_id, form_id, …]}. Alternatively, create directly through form endpoint:
- e.g., POST /studies/{id}/visits/{visit_id}/attach-form with form_id in body.
- Or have an order field in VisitFormAssignment to order forms if multiple per visit.
- GET /studies/{id}/visit-forms – get the mapping, or include in the /visits list each with forms.
These endpoints allow a study designer to fully configure a schedule of forms. In the UI, one might present a calendar or table: e.g., “At Baseline, include Form A, B; at Month1, include Form C”, etc., which then calls these APIs. For the prototype, we can keep it simple (perhaps the UI is not fully fleshed out, but the API exists so that we can script it or use Swagger to set up a study). Ensure to validate that forms being attached belong to the same study (we shouldn’t attach a form from another study). Also, after finalizing design, changes should be done carefully (maybe if data collection started, avoid deleting visits or forms – note for future). * Linking to Phase 3 (Data Capture): With visits and schedule defined, how does it drive data capture?
- We should generate Expected Form Instances for participants. When a participant is enrolled and assigned an arm, the system can create records for each scheduled form for that participant. For example, if Arm A has Baseline, Month1, Month2 visits with certain forms, upon adding a participant to Arm A, create entries like (participant, visit X, form Y, due_date = enrollment_date + offset). This could be a new model e.g., ScheduledTask or ParticipantVisit:
- Fields: id, participant_id, visit_id, form_id, due_date, status (not started/in-progress/completed).
- This acts as a to-do list of all forms a participant needs to complete. Initially, status = not started. When the form is submitted, we update status = completed and link the response.
- We may generate these on the fly (when listing participant schedule, compute due dates) or prepopulate at enrollment. Prepopulating is straightforward for fixed schedules. If a schedule is dynamic (e.g., next visit after 1 month from actual previous visit date), we might compute as we go. For now, assume static offsets from enrollment or a baseline date.
We will implement generation of these scheduled entries in Phase 5 when enrolling a participant. But the models and logic are defined in this phase along with study design. * Monitor and Quality Checks: Monitors might need to see the defined study schedule and forms to plan SDV (source data verification) etc. We won’t build those features, but ensure the data model can support features like marking a form as SDV done or queries raised. We could add minimal fields or just note it as future. * Audit Logging: Log all changes to study design: creation of study, arms, visits, and any changes in them. This is important since protocol amendments might occur. For each new arm or visit added, log who added it. If a form is attached to a visit, log that as well. Essentially the entire “study definition” should be audit-trailed. This helps in reconstructing what the study looked like at any point (and is useful if, say, a form was added mid-study – monitors and regulators can review audit trail to see that timeline). * Integration Note: The addition of study/arm/visit structure likely means updating earlier endpoints:
- When saving a form (Phase 2), we likely associated it with a study already. If not, we must now ensure forms have a study_id. Perhaps in Phase 2, forms were saved under a specific study context via the endpoint, so we’re fine.
- The participant side GET forms or POST responses should likely be under a participant’s schedule context. Possibly a better design: GET /participants/{pid}/tasks returns the list of pending form tasks (with form schema or references). We will cover that in Phase 5, but ensure the design of Phase 3’s endpoints can align. We might pivot to have participants always referenced via their schedule rather than directly pulling arbitrary form by ID.
Compliance Check (Phase 4): Introducing arms and visits touches on subject randomization and study setup:
- Randomization: If we implement automated randomization, ensure the process is reproducible and auditable. Log each randomization action: when a participant ID is assigned to an arm, record the timestamp, user (if triggered by a person, e.g., site staff enrolling patient), and the outcome (Arm A). If using a seed or deterministic method, note that. In a validated system, randomization algorithms should be tested to ensure unpredictability yet balance. Our prototype’s simple approach should at least guarantee allocation is not obviously biased. For compliance, also consider blinding: if the study is blinded, the system should not reveal the arm to certain users (e.g., site staff might see a code, not actual arm name). In prototype, we assume open-label or that such blinding rules are manual. But design could hide randomization info from certain roles.
- Study Design Lock: In real trials, once a study is launched, certain changes require formal amendments. We might mention that after a study is marked “active”, the system could lock editing of arms/visits (or at least log it heavily if changed). For prototype, we won’t enforce locking, but the audit trail will capture any changes.
- Multi-Center (Sites): We have the structure to create arms and visits, but what about multiple sites? Likely, each participant belongs to a site (center), and maybe randomization is stratified by site to maintain balance across centers . If so, our randomization module should incorporate site as a factor (e.g., separate arm allocation sequence per site). We might not implement fully, but at least include a Site model:
- Site:id, study_id, name, location.
- Link participants to site.
- Site staff user accounts tied to a site (so they only see that site’s participants). This is a concept of Data Access Groups in REDCap . We won’t implement full isolation now, but can simulate by filtering queries (e.g., when site staff requests participants list, filter by their site_id).
- Visit Tracking: For compliance, the system’s ability to schedule and track visits ensures participants get required assessments at proper intervals. Not a regulatory point per se, but missing a visit might be an issue in trial management. Our system will be designed to highlight upcoming or missed visits (notifying staff if needed).
By finishing Phase 4, we have a configurable study protocol: we can define a study with arms, visits and forms per visit, and have a mechanism (randomization) to allocate participants to arms. This elevates the EDC from a simple form system to a full trial management system structure.
Phase 5: Participant Management, eConsent, and Visit Workflows¶
Goals: Implement management of participants (subjects) in the system: enrolling participants into a study (with an assigned ID and arm via randomization), capturing eConsent, managing their visit schedule (with notifications), and providing a participant-facing experience for ongoing study activities. This phase also addresses participant engagement features: a mobile-first interface (possibly via Capacitor) and automated reminders for visits/forms.
-
Data Model – Participants: Create a Participant model with fields:
-
id (could be auto or a study-specific code),
- study_id (FK),
- site_id (FK if multi-center),
- arm_id (FK to Arm, if randomized),
- participant_code or subject_id (a human-friendly ID, like SCR001, could be just id or a generated code),
- consent_date (date of eConsent given),
- consent_version (which version of consent form was signed, if relevant),
- status (active, withdrawn, completed, etc.),
- plus basic demographics if needed (not required for the system, but sometimes stored – can ignore for now or have a JSON field for extra attributes).
- If participants have user accounts (to log in), link to the User table (maybe user_id foreign key). Alternatively, the Participant is a type of user. If we made participants separate, for login we might create a corresponding User with role ‘participant’. To keep it straightforward, we can say: for each Participant, there is a User account (with same id perhaps or a one-to-one link) for authentication. Or we combine them and say user with role participant is effectively our participant record. However, separating allows storing some attributes (like arm assignment) without cluttering the User model. We’ll proceed with a separate Participant entity linked to a User for auth, which is a common approach in clinical systems (user credentials vs subject record).
-
Participant Enrollment Workflow:
-
Creating a participant: Typically done by site staff via an interface when a new subject consents. Provide an endpoint POST /studies/{study_id}/participants for site staff to create a new participant. Input might include site (if staff belongs to multiple sites, else deduced), and maybe some initial data like name (if collecting in system) or just a reference. Because of eConsent, we might actually invite participants to self-enroll:
- Option 1: Site staff pre-creates the record, then system sends an invite or provides a link to the participant for eConsent.
- Option 2: Participant self-registers via a public link (which assigns them to a site perhaps based on a code or selection). OpenEDC even mentions participants can register themselves . We can support either, but for now focus on site staff enrolling to control who enters.
- When creating a participant, the system should:
- Create a Participant entry (with study, site).
- If the study has arms and is randomized, immediately assign an arm via randomization logic (and store arm_id). If using simple randomization, do arm = random.choice(study.arms) or use a tracking mechanism for balance. Log this assignment in the audit trail (including if any random seed or block used).
- Generate the participant’s schedule: iterate through the study’s visits (filter by arm if needed) and create ParticipantVisit or ScheduledTask entries for each. Each entry gets a due date. If we have a baseline/reference date, use that (e.g., consent date as day 0 or randomization date). For simplicity, use consent date as start: For each visit with day_offset, set due_date = consent_date + offset. If offsets are not defined, perhaps treat them as sequential visits at roughly specified intervals.
- If participants have login, create a User account for them (if not already existing). Perhaps collect email to send credentials or generate a default username/password. Alternatively, skip if not doing login.
- Trigger any initial notifications – e.g., email to participant with a link to the portal or to complete eConsent if that’s next.
- eConsent administration: If not done before enrollment, ensure the participant completes an electronic consent. There are a couple of approaches:
- Have the Consent Form as one of the scheduled tasks (like a visit “Consent” with the consent form). Then when the participant is created, the first task in their schedule is “Consent Form” due immediately. The participant can be given the link or asked to log in and complete it.
- Alternatively, handle consent first: e.g., site staff sends the participant a special link to an eConsent form (not requiring full login). On completion, the system creates the participant record and continues. However, this complicates linking data. For simplicity, assume we enroll them (perhaps as a pending participant) and then have them do the consent as task 1.
- Implement an endpoint to record consent if needed separate from a generic form: e.g., POST /studies/{study_id}/participants/{pid}/consent which could flip a flag. But if we treat it like a form, the submission of the consent form through the regular response API can itself mark consent (in the response handler, detect if form is of type consent, then update participant.consent_date etc).
- After consent, if required that staff countersign, the staff could have a separate form or simply mark in the system that consent was verified. For prototype, skip countersign or treat it as an optional second form.
-
Participant UI & Mobile-First Design: Provide a front-end interface for participants:
-
Possibly a separate React/Ionic app that uses the same API. If time allows, implement minimal pages: login, list of tasks, task detail (which loads SurveyJS form), and maybe a video visit page.
- Use Ionic Capacitor to wrap it so it can be deployed to iOS/Android if needed. In prototype stage, even a responsive web page is fine, but design with mobile in mind (large buttons, simple layout).
- Ensure the UI is intuitive: e.g., after login, “Upcoming Visits” or “To-Do: 3 forms to complete”. Each item shows the visit name and due date. Participant taps one, sees either the SurveyJS form or instructions if it’s an in-person visit (some visits might not be just a form, could be a video call).
- The participant UI should fetch their schedule via an endpoint like GET /participants/me/schedule (which the backend can compile from ParticipantVisit entries). This returns a list of upcoming (and maybe past) visits with info: visit name, due date, associated form (or activity) and its status (not started/completed). If a visit has multiple forms, list each or group by visit.
- When a participant selects a task (say a form), the app calls GET /studies/{study_id}/forms/{form_id}/take to get the form schema and then uses SurveyJS to render it, or possibly the schedule endpoint could embed a simplified representation of form (maybe just an ID and title, then another call gets full schema).
- Reminders & Notifications: To improve compliance, the system should remind participants of upcoming or overdue tasks:
- We can implement a background scheduler (using Celery or APScheduler or even FastAPI’s BackgroundTasks). For prototype, perhaps simulate by a function that when a participant is created or a visit due, sets a reminder (not fully required in code now, but design should allow).
- The actual sending of notifications can be via email or push notifications (if using Capacitor, push notifications could be configured via Firebase, which is beyond scope, so email/SMS is easier). Twilio SendGrid or similar service can be integrated: we’d need to store participant email/phone for this.
- We can set up a periodic job (cron) to check for tasks due in X days or overdue and send reminders. This might not be fully implemented, but the architecture should allow plugging this in (maybe a separate service or a FastAPI scheduled job).
- Provide an endpoint for the participant to opt-out or mark their communication preference if necessary (not needed at prototype stage, but mindful of it).
-
Integration of Video Conferencing for Remote Visits: Some visits might be conducted via video call (telemedicine). The system should accommodate this:
-
At study design, perhaps mark certain visits as “remote” or attach a “video conference” field. For instance, a Visit could have is_remote = True.
- For a remote visit, the study team might schedule a video call. Integration could be as simple as generating a Zoom/Teams link and storing it, or using an API like Twilio Video or Jitsi. For prototype, we can simulate by having a video_url field in ParticipantVisit or in Visit that gets populated.
- Possibly an endpoint like POST /participants/{pid}/visits/{visit_id}/schedule-call for site staff to set up a call (in real life, integrated with calendar). But as a placeholder, maybe just allow a URL to be stored.
- In the participant’s schedule, if a visit has a video_url, the front-end can show a “Join Video Call” button that opens that link. The presence of this integration point is key, even if we don’t implement actual video servers. Document that it could integrate with an SDK for a seamless in-app video in future.
- Also consider that during a video visit, site staff might administer forms verbally and fill them out, or the participant might fill forms with staff guidance. Our system already handles multiple users, so staff could fill on behalf if needed. But nothing special needed beyond what’s built.
-
API Endpoints – Participant Operations:
-
GET /studies/{study_id}/participants – list participants (for authorized users: site staff of that study’s site(s), study designers possibly all in study, monitors all in study read-only).
- POST /studies/{study_id}/participants – create/enroll a participant (as described, accessible by site staff or admin). Input: maybe an ID or initial info, returns participant ID and assigned arm.
- GET /studies/{study_id}/participants/{pid} – get participant details (includes assigned arm, maybe partial data, and possibly their schedule or link to schedule endpoint).
- GET /participants/me/schedule – participant’s view of their own visits (requires auth as participant).
- GET /studies/{study_id}/participants/{pid}/schedule – same data but for staff viewing a participant’s timeline (monitors, etc.).
- POST /studies/{study_id}/participants/{pid}/withdraw – (future) mark participant as withdrawn (stop scheduling new visits, lock further data entry, but keep data).
- POST /studies/{study_id}/participants/{pid}/visit/{visit_id}/reschedule – (future) if needed to change dates or mark missed, etc.
- If implementing sending invites: POST /studies/{study_id}/participants/{pid}/send-invite (to email consent link) can be an optional endpoint hooking into an email service.
For eConsent specifically, if treating as form, we already cover it via forms. If treating separately:
- GET /studies/{study_id}/participants/{pid}/consent-form – to retrieve current consent form schema (e.g., if separate from other forms; REDCap e-consent uses a special instrument in the same project).
- POST /studies/{study_id}/participants/{pid}/consent – to submit consent (but we might just reuse POST /.../responses for the consent form).
- Possibly an endpoint to fetch the signed consent PDF if we generate one, like GET /studies/{study_id}/participants/{pid}/consent.pdf. For now, we store the data and could generate PDF on demand or via a background job.
-
Audit Logging: We intensify audit logs in this phase:
-
Enrollment: log when a participant is created, by whom, and the assigned ID and arm. If randomization occurred, log the randomization event separately as well (with random sequence details if any).
- Consent: log that participant X consented on date Y (and ideally, which form version). If PDF generated, that is an official record.
- Data updates: if any participant info is edited (not many fields to edit, maybe just status updates), log it.
- Visit completions: log when a participant’s visit is completed, who completed it (if staff entered data for a visit, log that action as well).
- Essentially, ensure that every key step in participant management is auditable, meeting the GCP requirement that you can reconstruct the course of the trial conduct from records.
-
Compliance & Retention:
-
Data retention: At the end of a study or if a participant withdraws, their data should remain in the system (not deleted) for the regulatory retention period (often years). We will not implement deletion in prototype, but if the design included a delete endpoint (for test data perhaps), ensure it’s admin-only and still archived (soft delete).
- Privacy and Access: Implement safeguards so that a participant can only see their own data, site staff only their site’s participants, monitors only the studies they monitor. We touched on filtering by site; this should be enforced in the queries for participants and responses. For instance, a site staff’s request to list participants can automatically filter by site_id = user.site_id (assuming one site per user in that role).
- Informed Consent Compliance: The eConsent process in the system should align with regulatory expectations:
- Participants must have ample time to read the consent, possibly with multimedia (not in scope to implement, but mention if using videos or quizzes to ensure understanding).
- The signed consent must be archived (the prototype captures the data; in a production, generate a PDF that is uneditable, often stored in a file system or database as blob).
- Provide the participant a copy of their signed consent. REDCap eConsent auto-emails a PDF to the participant when configured. We can note that as a future enhancement: after signing, trigger an email with the PDF attachment.
- If a new consent version is approved during the study, the system should allow capturing re-consent from participants. Our version control on forms can facilitate this: e.g., update the consent form as a new version and mark who has signed which version. For now, just highlight that the consent form is versioned and participants should sign the version they were presented, which we record.
- Participant Safety and Communication: If a participant reports certain responses (like severe adverse events) via a form, the system should flag and alert the site staff. That’s beyond prototype scope, but mention that the architecture allows checking responses (since they’re stored centrally) and could integrate alert rules (like an email to investigators if a particular field value is critical). Not implementing now, but the data model doesn’t preclude it.
By the end of Phase 5, the prototype supports an entire study workflow: from enrolling a participant (with eConsent) to guiding them through scheduled forms, with site staff and monitors overseeing. This is essentially a functional EDC with ePRO capabilities , covering recruitment to data collection.
Phase 6: Data Extraction and Integration¶
Goals: Implement features to retrieve the collected data and integrate with external systems. This includes a secure API (and possibly UI) for exporting data (per form or entire study) in common formats (CSV, JSON), and preparing for integration with statistical tools or trial registries. We also ensure the system can interface with other services (like video calls as noted, or potentially EHR/registry data linkage).
-
Data Export API: Provide endpoints to export data for analysis:
-
GET /studies/{id}/export – Export all study data. Since the data is relational (multiple forms, participants, etc.), we might offer a zip of CSVs (one CSV per form) or a single ZIP containing one CSV per form instrument, or an option to choose format.
- Query parameters could specify format: ?format=csv or ?format=json.
- For CSV, one approach is to output data in a wide format per form: each CSV has columns for participant id, and each question field (column names derived from question names or codes in the form schema). We can generate these by reading the form schema to know the question fields. Given the flexibility of forms, not all responses have same fields if optional, but we include all possible columns.
- Alternatively, output a raw format: e.g., a CSV with columns: participant_id, form_name, question_name, answer (one row per answer). But that’s harder for analysis, so per form CSV is standard (like REDCap’s data export for each instrument).
- For JSON export, maybe output a structured JSON: an array of participants, each containing their forms and answers. Or an array of responses with all details.
- Implement at least CSV per form for now. Python’s CSV or Pandas can help form the output.
- GET /studies/{id}/forms/{form_id}/export – Export data for a single form. This returns CSV or JSON of all responses of that form across participants (with participant identifiers). Optionally allow filtering by site or date.
- GET /studies/{id}/participants/{pid}/export – Export all data for a single participant (e.g., compile their responses across all forms, perhaps as one JSON or a PDF summary). This is useful for patient profile review, but not critical for initial prototype.
These endpoints must be secured so that only authorized roles can use them. Generally, study designers or data managers and monitors would export data. Site staff might export their site’s data if allowed (could restrict if needed). Ensure monitors only get de-identified data if that’s a rule (but monitors usually can see identified source data; however, we might implement de-id options for data exports destined for statisticians). * De-Identification and Data Filtering: When exporting, include options to protect sensitive data:
- For instance, a deidentify=true parameter that removes direct identifiers (like name, contact info fields) and possibly date shifting. REDCap’s data export tool can automatically remove or shift certain fields . We can implement a simplified version: if forms have identifiable fields marked (we might need a way in form schema to label identifiers), the export can blank or drop those columns when deidentify is requested.
- This is important if sharing data with external researchers while preserving privacy.
- Implementing date shifting might be complex; maybe skip in prototype but mention it. Essentially, date shifting means offset all dates by a random number per participant to de-identify timeline while preserving intervals.
- Data Visualization/Reports: Not a core ask, but note that we could integrate something like a dashboard for basic stats or a direct connection to analysis tools (R, Python). In prototype, perhaps skip implementing, but mention that the data can be easily pulled into Pandas or R via the API or direct DB connection for analysis. For the plan, focusing on export is sufficient.
-
Registry Integration and External APIs: The system should be designed with integration in mind:
-
jsPsych Integration: If in future we want to administer cognitive tests or other interactive experiments, we can integrate jsPsych (a JavaScript library) similar to SurveyJS. Possibly as another kind of “form” or activity. For example, a form definition might have a type (SurveyJS or jsPsych or other). If jsPsych, the front-end would load a jsPsych experiment instead of a survey. The results of jsPsych (e.g., reaction times, etc.) can be captured and sent via an endpoint. We should ensure our architecture can accept that data – maybe as another JSON result stored in the FormResponse (with a flag indicating type of activity).
- In short, plan that new modules can plug in without major refactoring. One way: create an abstract notion of Study Activity which can be a survey, a task (jsPsych), a video call, etc., each with possibly different handling. Our existing models cover surveys; for jsPsych we might need to store experiment parameters or results differently, but could still use the JSON field (just different schema). The front-end would handle running jsPsych and then calling an API to save results.
- So, in the future, we might have an endpoint like POST /studies/{id}/activities/{activity_id}/results for a generic approach. For now, surveys and tasks are not distinguished in code, but we note this flexibility.
- MFA (Multi-Factor Auth): In a future security update, incorporate MFA for user logins (especially for staff accounts with sensitive data). The architecture should allow an extra step during login or at least allow storing an MFA secret per user (for TOTP apps) or using SMS/email OTP. Since we plan to potentially use Twilio for notifications, we could use it for SMS codes. Not implemented now, but adding it would involve updating the auth workflow in Phase 1: e.g., after password verification, require an OTP. We mention this so the team knows to leave room (like not finalizing the auth UX in stone).
- Clinical Registry / EHR Linkage: Often, trial data might need to link with external databases (for recruitment or outcome collection). For example, linking to a patient registry or electronic health records. Our prototype can accommodate this by:
- Allowing storage of an external ID in Participant (like an MRN or registry ID).
- Possibly providing APIs to fetch or push data. For instance, an API that given a participant ID can retrieve certain data from an EHR (if API available), but that’s beyond scope. More realistically, allow our data to be exported in standards like CDISC ODM or FHIR for interoperability . That’s advanced, but since OpenEDC mentions CDISC ODM export , we note that as a future extension: implementing an exporter to ODM XML or FHIR bundles so the data can be shared/archived in standardized formats.
- For now, our CSV/JSON export suffices for integration with analysis tools. If needed, we can also provide a simple API to get data by query (like GET /api/data?study=X&form=Y&participant=Z returning JSON) for programmatic access by analysis scripts.
-
Admin and Monitoring Tools: We should also mention any admin-level features not covered:
-
Audit Log Review: Possibly an endpoint or UI for admins/monitors to read the audit log entries. Eg, GET /audit?study_id=X with filters. They could download audit trail as CSV for inspection. This is important in inspections to see a log of all changes . For now, ensure audit data is accessible somehow (even if via direct DB query).
- User Management UI: For completeness, an admin UI to manage users/roles (Phase 1 provided endpoints, but a simple interface can be built or use Swagger UI for now).
- System Configuration: Not much in prototype, but maybe things like setting up email server for notifications, etc., might be needed. These can be in config files or .env.
Compliance Check (Phase 6):
- Data Export Integrity: When producing data outputs, ensure they are accurate and complete copies of the data . The system should be able to output all records in a human-readable format. Our CSV exports cover that. If asked to provide data to regulators, having a well-documented export is vital. Also, maintain the linkage between exported data and metadata: e.g., providing a data dictionary. Perhaps implement an endpoint GET /studies/{id}/data-dictionary that exports the form schemas (field names, types) for transparency.
- Audit Trail Export: Part of compliance is the ability to produce audit logs for inspection . We should allow exporting the audit trail (which could be included with data or separate).
- Retention and Archiving: Consider how to archive a study once completed. We might dump the database schemas, or mark study as archived and revoke user access except admin. Also, data should be retained read-only. Our prototype likely won’t implement archive logic, but design-wise, plan to have a study status and if closed, lock further data entry (we can enforce in endpoints if study.is_closed: reject POST responses etc.). This ensures no changes after database lock. Mention that in a real environment, you’d do a database snapshot for archival that could be imported or referenced later but not changed.
- Compliance Summary: Summarize that our prototype, though not production-validated, is built with compliance in mind: unique user credentials, role-based permissions, audit trails with timestamps for all data modifications, data is never deleted without trace, electronic signatures on consent, and optional two-factor auth for security. It aligns with FDA 21 CFR Part 11 and GCP expectations at a fundamental level . Features like reason for change on data edits , query management, and formal sign-offs can be added as next steps once the core is in place. The architecture is modular enough to incorporate these: e.g., add a Query model linking to responses for monitors to flag issues, add a field in responses for “verified” to support source data verification by monitors , and implement an e-signature workflow for PI to sign the completed dataset at end of study (which would simply be another audited action requiring password re-entry ).
Phase 7: Future Enhancements and Modular Extensibility¶
(This “phase” is more of a roadmap beyond the initial prototype, highlighting how the architecture can scale or be enhanced in specific areas the user mentioned, ensuring the implementation plan is future-proof.)
- Integration of jsPsych and Other Modules: As noted, to extend beyond surveys, the system can incorporate cognitive tasks or other interactive modules. We would introduce a modular plugin system:
- Define an ActivityType (Survey, Task, etc.) for each scheduled activity. For a jsPsych task, the form schema in our DB could store the experiment definition or a reference to a predefined task. The front-end then detects the type and either loads SurveyJS or initializes a jsPsych timeline. The results could be stored in the same FormResponse table, possibly with a flag or separate field if needed (or a separate TaskResults table if format differs).
- This modular approach can also handle other data capture types (e.g., sensor data uploads, image capture forms, etc.) by adding new activity handlers.
- Architecture impact: minimal on backend (still storing JSON results), more on frontend (need to include jsPsych library and result handling).
- Two-Factor Authentication (MFA): Implementing MFA in Phase 1’s auth flow would increase security for accounts. Likely steps:
- Store an OTP secret for users who enable MFA.
- Use a library like PyOTP for TOTP (Time-based One-Time Password) or send SMS codes via an integration.
- Modify login endpoint to handle a second step (e.g., if user has MFA enabled, first verify password, then expect an OTP code).
- Alternatively, integrate with an SSO that has MFA (less likely in this context).
- Since our RBAC includes highly privileged roles (admin, etc.), MFA is strongly recommended for those roles in real deployments.
- Regulatory Compliance Enhancements: As a future step to move from prototype to production:
- Validation and Testing: The system must undergo validation testing (per FDA guidelines) – unit tests, integration tests to prove that it works as intended and is compliant.
- Audit Trail Security: Possibly implement append-only logging at the database level (e.g., using Postgres logging or an audit trigger that writes to a separate table or schema that even DB admins don’t easily modify). This ensures even developers cannot tamper with logs. We could use Postgres features or ORMs events to log old vs new values on critical tables (alternatively, the manual logging we do suffices for prototype).
- Electronic Signatures: Extend eConsent to other forms if needed. For example, at end of study, an investigator might “sign” the dataset. This could be implemented by a special endpoint that requires the user’s credentials again and then records a signature in the audit log (and maybe in a PDF report of the data). We have the building blocks (user identity and audit log) to add this.
- Part 11 Compliance Review: Ensure the system meets all criteria (we have covered most: unique IDs, secure access, audit trails, data backup, retention, e-signatures, etc. – any we missed can be added, such as account lockout after failed logins, password expiry policy, etc., which can be configured).
- Performance and Scaling: As usage grows, consider scaling strategies:
- Use async features of FastAPI and database connection pooling for high concurrency (FastAPI can handle many requests with Uvicorn). Move heavy tasks (like generating large exports or sending many emails) to background tasks or a Celery worker.
- Implement caching for forms that are retrieved often (perhaps store rendered form JSON in a cache to avoid hitting DB every time).
- Ensure that the JSON fields (form schemas and responses) are indexed if needed for queries (PostgreSQL allows JSONB indexing if we query within JSON, though mostly we fetch by id so not critical).
- Use pagination on list endpoints (participants, responses) to handle large studies.
- Modularity and Clean Architecture: The codebase should be organized by feature modules. For instance, you might have:
- app/auth – auth and user management,
- app/study – study/arm/visit endpoints and logic,
- app/forms – form and response endpoints,
- app/participants – participant and schedule endpoints,
- app/consent – eConsent related (though could be under forms),
- app/export – data export endpoints,
- etc. This modular separation aligns with Clean Architecture and makes future extensions easier. For example, adding a new module for “Query Management” (monitors flagging data issues) could be done without touching unrelated parts, just add new models (Query, QueryResponse) and endpoints.
- Monitoring and Maintenance: Add logging and error monitoring (FastAPI logging setup, perhaps integrate Sentry for runtime errors). Also, ensure that the database is regularly backed up. For dev, maybe use simple dumps, but in production, a backup strategy is needed. Data retention policy might require backups to be stored x years as well.
- User Interface Enhancements: While this plan focuses on backend, note that a user-friendly UI is crucial. Eventually, building a polished React or Angular application for the web portal (for study staff and participants) will be necessary, as well as possibly native mobile UI (via Capacitor or fully native if needed for advanced features). The backend being RESTful will support any UI framework. Also, consider adding a Survey Designer UI that is more domain-focused (SurveyJS Creator is generic; we might wrap it to show a list of forms, allow reuse of questions, etc., but that’s a nicety for later).
- Documentation and Training: Document the API (using OpenAPI docs generated by FastAPI, which can serve as live docs for developers). Also, user documentation for using the system (especially for roles like how to design a form, how to enroll a participant, etc.) should be planned.
Conclusion: This phased implementation plan provides a comprehensive roadmap to build a modern EDC system prototype with FastAPI and PostgreSQL, mirroring core REDCap capabilities in a cleaner architecture. Each phase delivers a functional increment – from core auth and RBAC, through form design, data capture, study setup, participant management, to data export – all integrated with SurveyJS and designed with regulatory compliance in mind. Following these steps, a development team can implement the system in logical chunks, ensuring at each stage that security, auditability, and flexibility are maintained. By prioritizing clear modular design and anticipating future needs (like MFA, jsPsych, and external integrations), the system will be robust yet extensible, ready to evolve from prototype to a production-ready, compliant EDC platform.
Sources:
- SurveyJS Integration Guidelines – storing survey schemas and results in JSON with a one-to-many relationship ; roles needed for Creator vs Runner .
- OpenEDC Features – compliance and EDC features reference (audit trails, eConsent, randomization, multi-center) .
- REDCap e-Consent framework – insight into capturing signatures and auto-archiving consent PDFs .
- FDA 21 CFR Part 11 guidance – requirement of secure, computer-generated, time-stamped audit trails for all electronic records , ensuring each action is attributable and traceable.