Tiered design
Below is a tiered strategy that is practical for remote, web-based deployment, is compatible with open-access / open-source tooling, and provides a clean pathway from broad screening to mechanistic deep phenotyping in ages 8–12 and 13–18.
I am assuming a research context where you want (i) scalable throughput, (ii) defensible data quality in unsupervised settings, and (iii) the option to “go deep” only when warranted.
Design principles¶
- Two-pass strategy
- Tier 1 (Screening): short, robust, low-burden tests that yield stable composites and flag domain-specific concerns.
- Tier 2 (Deep phenotyping): longer, more granular tasks that decompose cognition into subcomponents and enable trial-level modeling (RT distributions, error types, learning curves).
- Remote-first validity
- Use tasks with ** simple instructions , ** high signal-to-noise , and ** built-in data-quality checks .**
- Standardize device class (ideally laptop/desktop + keyboard) for RT tasks; treat mobile/touch as a separate mode.
- Open-access stack
- Implement tasks in jsPsych (browser-native) and/or adopt open batteries built on jsPsych (e.g., CAM) and other openly available web tasks (e.g., ANTI-style attention network tasks where available).
- Store trial-level data to enable re-scoring, QC, and computational analyses.
Tier 0: Enrollment and readiness (3–5 minutes)¶
Purpose: reduce missingness, improve data quality, and standardize context.
Components
- Device/environment checklist: quiet room, seated, no multitasking, headphones if auditory tasks.
- Automated checks: screen size, browser, keyboard detection, audio output test.
- Brief training module: “how to respond quickly and accurately,” practice trials with feedback.
- Minimal metadata: age, grade, handedness, language, sleep last night (optional), ADHD meds today (optional).
Outputs
- “Test readiness” flag + device mode label (desktop vs tablet vs phone).
Tier 1: Screening battery (20–30 minutes total)¶
Goal: broad coverage across all major domains with minimal fatigue.
A. Global ability / reasoning (5–7 min)¶
Rationale: provides an anchor for general cognitive ability and helps interpret specific deficits.
- Fluid reasoning: short matrix reasoning (ICAR-style items or an open matrix set)
- Optional: number series / simple analogies (age-appropriate)
Metrics: accuracy, time per item, total score; optional IRT if item bank supports it.
B. Processing speed (3–4 min)¶
- Pattern/symbol comparison task (same/different discrimination; timed)
Metrics: correct per time, RT median, lapses.
C. Attention & inhibition (5–7 min)¶
Pick one of:
- Flanker (selective attention + inhibition) or
- Go/No-Go (response inhibition) with enough trials to get stable estimates.
Metrics: commission errors, omission errors, RT median, RT variability.
D. Working memory (4–6 min)¶
Pick one:
- Digit span (forward/backward) adapted for browser (audio or visual presentation) or
- 1-back / 2-back (visual) with short blocks.
Metrics: accuracy, d′ (if applicable), RT.
E. Episodic memory (5–7 min)¶
- Brief paired associates (picture–location, object–name, or image–image) or
- Short recognition memory (encode 10–15 items → forced-choice recognition)
Metrics: hits/false alarms, d′, response bias.
F. Language (optional, 5–8 min; consider adding only if relevant)¶
Language is the hardest domain to do well remotely without licensing issues. If you include it:
- Receptive vocabulary using an open word–picture mapping set (must be age-appropriate and validated within your cohort)
- Reading can be approximated with timed word recognition, but dialect/education effects are large.
Metrics: accuracy, response time; interpret cautiously.
Tier 1 outputs and decision rules¶
Deliver (i) domain z-scores, (ii) a global composite, and (iii) QC flags.
Suggested triggers to Tier 2
- Any domain score ≤ ~10th percentile of your study sample (or a normative reference if you have one).
- High RT variability, high lapse rate, or inconsistent responding (suggests attention/engagement issues).
- Profile patterns of interest (e.g., “WM low, inhibition high”).
- Clinical or exposure subgroup of interest (e.g., concussion, epilepsy, migraine, sleep disorder).
Tier 2: Deep phenotyping modules (60–150 minutes, modular; can be split across 2–3 sessions)¶
Goal: disentangle constructs, enable mechanistic inference, and improve sensitivity to change.
Structure Tier 2 as domain modules you can assign based on Tier 1 results. Each module below is ~20–40 minutes.
Module 1: Attention networks + vigilance (25–35 min)¶
- Attention network task (alerting, orienting, executive control)
- Sustained attention/vigilance (e.g., CPT-style) with sufficient duration for lapses
Why this matters: separates “can’t focus” into orienting vs executive control vs arousal/vigilance failures.
Key metrics: RT distributions, lapse rate, time-on-task effects, post-error slowing.
Module 2: Executive function decomposition (25–40 min)¶
- Task-switching (set shifting; switch cost)
- Stroop or interference task (inhibition under conflict)
- Optional: planning (Tower-style) if you can implement it reliably web-based
Key metrics: switch cost, interference cost, error types, speed–accuracy tradeoff.
Module 3: Working memory capacity and control (25–40 min)¶
- Complex span (operation span / symmetry span—adapted carefully)
- N-back with lures (to probe interference control)
- Optional: updating paradigms
Key metrics: capacity estimates, interference susceptibility, RT variability, learning curves across blocks.
Module 4: Episodic memory systems (25–45 min)¶
- Encoding + immediate + delayed recognition
- Source memory (item + context)
- Optional: associative memory with lure discrimination (pattern separation proxy)
Key metrics: d′, false alarm pattern, retention over delay, lure discrimination index.
Module 5: Learning and adaptation (20–35 min)¶
- Probabilistic learning or reversal learning (feedback-driven adaptation)
- Optional: simple reinforcement learning task
Key metrics: learning rate, perseveration, win–stay/lose–shift, exploration/exploitation tendency.
Module 6: Social cognition (optional; 20–40 min)¶
If your research questions require it (psychiatry, autism, adolescent socio-emotional development):
- Emotion recognition (faces)
- Theory of mind (short vignette inference)
Key metrics: accuracy by emotion type, response bias, RT.
Tier 3: Add-ons (as needed)¶
A. Ecological function and informant report (10–20 min)¶
Often essential for “cognitive status” interpretation:
- Open questionnaires are limited; if you can’t use commercial scales, at minimum collect:
- school function, attention concerns, sleep, mood, headaches, screen time, academic supports.
B. Longitudinal micro-assessment (5–10 min per timepoint)¶
For intervention studies or symptom fluctuations:
- Use 1–2 short “digital biomarkers” with low practice effects:
- processing speed + vigilance (brief) + simple WM
- Repeat weekly/monthly; model within-subject change.
Recommended implementation architecture (remote web-based)¶
Core build¶
- jsPsych for tasks + JATOS or a lightweight backend (or REDCap via external modules) for study flow and data capture.
- Store trial-level events (stimulus, response, RT, correctness, condition labels).
Data quality safeguards (must-have)¶
- Embedded attention checks (not trick questions; performance-consistency checks)
- Minimum RT thresholds + RT outlier tagging (do not auto-delete; flag)
- Practice blocks with criteria (e.g., must reach 70% before continuing)
- Session interruption detection (tab-switch, idle time)
- Post-session self-report: “Were you interrupted?”
Scoring strategy¶
- Tier 1: stable summary metrics + composites (speed, variability, accuracy)
- Tier 2: trial-level + computational parameters (e.g., diffusion model where appropriate)
- Norms: if open normative datasets are not available for your exact tasks, use:
- internal age-banded norms (8–10, 11–12, 13–15, 16–18) and report standardized scores within band.
A concrete example workflow¶
- Tier 1 (25 min) for all participants.
- Automatically assign Tier 2 modules based on profile:
- If attention/inhibition low or QC suggests lapses → Module 1 + Module 2
- If WM low → Module 3
- If memory low → Module 4
- If adolescent mental health or social outcomes are central → Module 6
- If mechanistic change/intervention → add Tier 3 longitudinal micro-assessments
- Split Tier 2 across two sessions (e.g., 45–60 min each) to reduce fatigue.
Age-specific notes (8–12 vs 13–18)¶
- 8–12: prioritize simpler instructions, shorter blocks, more practice with feedback, fewer condition switches; interpret language tasks cautiously.
- 13–18: you can increase difficulty (more trials, more lures, probabilistic learning), and computational modeling becomes more stable.
Implemented Tasks Reference¶
The following tasks are currently implemented in the Metricis platform.
Tier 1 Tasks (25 tasks)¶
| Task | Domain | Duration | Key Metrics |
|---|---|---|---|
| Simple RT | Processing Speed | 2 min | mean_rt, rt_variability |
| Choice RT | Processing Speed | 3 min | mean_rt, accuracy |
| CPT (Go/No-Go) | Attention | 5 min | d_prime, commission_errors, omission_errors |
| Flanker | Executive Function | 4 min | congruency_effect, accuracy |
| Stroop | Executive Function | 4 min | interference_effect, accuracy |
| N-Back | Working Memory | 5 min | d_prime, accuracy_by_level |
| Digit Span | Working Memory | 4 min | forward_span, backward_span |
| Corsi Blocks | Working Memory | 4 min | spatial_span |
| Digit Symbol | Processing Speed | 3 min | symbols_correct, processing_speed |
| Trail Making | Executive Function | 4 min | completion_time_a, completion_time_b |
| Verbal PA | Memory | 5 min | immediate_recall, delayed_recall |
| Visual PA | Memory | 5 min | immediate_recall, delayed_recall |
| MOT | Attention | 4 min | tracking_accuracy |
| Letter-Number Switching | Executive Function | 4 min | switch_cost, mixing_cost |
| Emotion Recognition | Social Cognition | 4 min | accuracy_by_emotion |
| Matrix Reasoning | Reasoning | 6 min | total_correct, accuracy |
| Vocabulary | Reasoning | 5 min | total_correct |
| Delay Discounting | Decision Making | 4 min | k_value, auc |
| Divided Attention | Attention | 5 min | dual_task_cost |
| WCST | Executive Function | 6 min | categories_completed, perseverative_errors |
| Probabilistic Reversal | Decision Making | 5 min | learning_rate, reversal_accuracy |
| ANT | Attention | 6 min | alerting_effect, orienting_effect, conflict_effect |
| Picture Sequence | Memory | 5 min | sequence_accuracy |
| MFIS | Questionnaire | 3 min | total_score, subscale_scores |
| PROMIS | Questionnaire | 5 min | domain_t_scores |
Tier 2 Deep Phenotyping Tasks (8 tasks)¶
| Task | Domain | Duration | Key Metrics | Module |
|---|---|---|---|---|
| Vigilance CPT | Attention | 18 min | vigilance_decrement, block_d_primes, lapse_rate | Module 1 |
| Operation Span | Working Memory | 15 min | partial_load_score, absolute_span, math_accuracy | Module 3 |
| Tower Task | Executive Function | 12 min | problems_solved, excess_moves, first_move_time | Module 2 |
| Source Memory | Memory | 15 min | item_d_prime, source_accuracy, context_bias | Module 4 |
| Pattern Separation | Memory | 18 min | pattern_separation_score, lure_discrimination_index | Module 4 |
| Theory of Mind | Social Cognition | 12 min | first_order_accuracy, second_order_accuracy | Module 6 |
| Semantic Fluency | Language | 8 min | total_words, cluster_size, num_switches | Language |
| Sentence Comprehension | Language | 10 min | complexity_effect, accuracy_simple, accuracy_complex | Language |
Computational Modeling Utilities¶
All tasks with RT data support additional computational metrics via the computational-modeling.ts utility:
| Analysis | Metrics | Applicable Tasks |
|---|---|---|
| RT Distribution | mean, median, sd, cv, skewness, tau | All RT-based tasks |
| Post-Error Dynamics | post_error_slowing, post_error_accuracy | Go/No-Go, Flanker, Stroop |
| EZ-Diffusion DDM | drift_rate, boundary, non_decision_time | Choice RT, Flanker, Stroop |
| Speed-Accuracy Tradeoff | sat_index, efficiency | All accuracy + RT tasks |
If you want, I can turn this into a domain-by-domain test menu with (i) specific open-source task implementations, (ii) recommended parameter settings (trial counts, timings) for 8–12 and 13–18, and (iii) a suggested data dictionary and QC rules suitable for an SAP-style appendix.