Skip to content

Tiered design

Below is a tiered strategy that is practical for remote, web-based deployment, is compatible with open-access / open-source tooling, and provides a clean pathway from broad screening to mechanistic deep phenotyping in ages 8–12 and 13–18.

I am assuming a research context where you want (i) scalable throughput, (ii) defensible data quality in unsupervised settings, and (iii) the option to “go deep” only when warranted.


Design principles

  1. Two-pass strategy
  2. Tier 1 (Screening): short, robust, low-burden tests that yield stable composites and flag domain-specific concerns.
  3. Tier 2 (Deep phenotyping): longer, more granular tasks that decompose cognition into subcomponents and enable trial-level modeling (RT distributions, error types, learning curves).
  4. Remote-first validity
  5. Use tasks with ** simple instructions , ** high signal-to-noise , and ** built-in data-quality checks .**
  6. Standardize device class (ideally laptop/desktop + keyboard) for RT tasks; treat mobile/touch as a separate mode.
  7. Open-access stack
  8. Implement tasks in jsPsych (browser-native) and/or adopt open batteries built on jsPsych (e.g., CAM) and other openly available web tasks (e.g., ANTI-style attention network tasks where available).
  9. Store trial-level data to enable re-scoring, QC, and computational analyses.

Tier 0: Enrollment and readiness (3–5 minutes)

Purpose: reduce missingness, improve data quality, and standardize context.

Components

  • Device/environment checklist: quiet room, seated, no multitasking, headphones if auditory tasks.
  • Automated checks: screen size, browser, keyboard detection, audio output test.
  • Brief training module: “how to respond quickly and accurately,” practice trials with feedback.
  • Minimal metadata: age, grade, handedness, language, sleep last night (optional), ADHD meds today (optional).

Outputs

  • “Test readiness” flag + device mode label (desktop vs tablet vs phone).

Tier 1: Screening battery (20–30 minutes total)

Goal: broad coverage across all major domains with minimal fatigue.

A. Global ability / reasoning (5–7 min)

Rationale: provides an anchor for general cognitive ability and helps interpret specific deficits.

  • Fluid reasoning: short matrix reasoning (ICAR-style items or an open matrix set)
  • Optional: number series / simple analogies (age-appropriate)

Metrics: accuracy, time per item, total score; optional IRT if item bank supports it.


B. Processing speed (3–4 min)

  • Pattern/symbol comparison task (same/different discrimination; timed)

Metrics: correct per time, RT median, lapses.


C. Attention & inhibition (5–7 min)

Pick one of:

  • Flanker (selective attention + inhibition) or
  • Go/No-Go (response inhibition) with enough trials to get stable estimates.

Metrics: commission errors, omission errors, RT median, RT variability.


D. Working memory (4–6 min)

Pick one:

  • Digit span (forward/backward) adapted for browser (audio or visual presentation) or
  • 1-back / 2-back (visual) with short blocks.

Metrics: accuracy, d′ (if applicable), RT.


E. Episodic memory (5–7 min)

  • Brief paired associates (picture–location, object–name, or image–image) or
  • Short recognition memory (encode 10–15 items → forced-choice recognition)

Metrics: hits/false alarms, d′, response bias.


F. Language (optional, 5–8 min; consider adding only if relevant)

Language is the hardest domain to do well remotely without licensing issues. If you include it:

  • Receptive vocabulary using an open word–picture mapping set (must be age-appropriate and validated within your cohort)
  • Reading can be approximated with timed word recognition, but dialect/education effects are large.

Metrics: accuracy, response time; interpret cautiously.


Tier 1 outputs and decision rules

Deliver (i) domain z-scores, (ii) a global composite, and (iii) QC flags.

Suggested triggers to Tier 2

  • Any domain score ≤ ~10th percentile of your study sample (or a normative reference if you have one).
  • High RT variability, high lapse rate, or inconsistent responding (suggests attention/engagement issues).
  • Profile patterns of interest (e.g., “WM low, inhibition high”).
  • Clinical or exposure subgroup of interest (e.g., concussion, epilepsy, migraine, sleep disorder).

Tier 2: Deep phenotyping modules (60–150 minutes, modular; can be split across 2–3 sessions)

Goal: disentangle constructs, enable mechanistic inference, and improve sensitivity to change.

Structure Tier 2 as domain modules you can assign based on Tier 1 results. Each module below is ~20–40 minutes.

Module 1: Attention networks + vigilance (25–35 min)

  • Attention network task (alerting, orienting, executive control)
  • Sustained attention/vigilance (e.g., CPT-style) with sufficient duration for lapses

Why this matters: separates “can’t focus” into orienting vs executive control vs arousal/vigilance failures.

Key metrics: RT distributions, lapse rate, time-on-task effects, post-error slowing.


Module 2: Executive function decomposition (25–40 min)

  • Task-switching (set shifting; switch cost)
  • Stroop or interference task (inhibition under conflict)
  • Optional: planning (Tower-style) if you can implement it reliably web-based

Key metrics: switch cost, interference cost, error types, speed–accuracy tradeoff.


Module 3: Working memory capacity and control (25–40 min)

  • Complex span (operation span / symmetry span—adapted carefully)
  • N-back with lures (to probe interference control)
  • Optional: updating paradigms

Key metrics: capacity estimates, interference susceptibility, RT variability, learning curves across blocks.


Module 4: Episodic memory systems (25–45 min)

  • Encoding + immediate + delayed recognition
  • Source memory (item + context)
  • Optional: associative memory with lure discrimination (pattern separation proxy)

Key metrics: d′, false alarm pattern, retention over delay, lure discrimination index.


Module 5: Learning and adaptation (20–35 min)

  • Probabilistic learning or reversal learning (feedback-driven adaptation)
  • Optional: simple reinforcement learning task

Key metrics: learning rate, perseveration, win–stay/lose–shift, exploration/exploitation tendency.


Module 6: Social cognition (optional; 20–40 min)

If your research questions require it (psychiatry, autism, adolescent socio-emotional development):

  • Emotion recognition (faces)
  • Theory of mind (short vignette inference)

Key metrics: accuracy by emotion type, response bias, RT.


Tier 3: Add-ons (as needed)

A. Ecological function and informant report (10–20 min)

Often essential for “cognitive status” interpretation:

  • Open questionnaires are limited; if you can’t use commercial scales, at minimum collect:
  • school function, attention concerns, sleep, mood, headaches, screen time, academic supports.

B. Longitudinal micro-assessment (5–10 min per timepoint)

For intervention studies or symptom fluctuations:

  • Use 1–2 short “digital biomarkers” with low practice effects:
  • processing speed + vigilance (brief) + simple WM
  • Repeat weekly/monthly; model within-subject change.

Core build

  • jsPsych for tasks + JATOS or a lightweight backend (or REDCap via external modules) for study flow and data capture.
  • Store trial-level events (stimulus, response, RT, correctness, condition labels).

Data quality safeguards (must-have)

  • Embedded attention checks (not trick questions; performance-consistency checks)
  • Minimum RT thresholds + RT outlier tagging (do not auto-delete; flag)
  • Practice blocks with criteria (e.g., must reach 70% before continuing)
  • Session interruption detection (tab-switch, idle time)
  • Post-session self-report: “Were you interrupted?”

Scoring strategy

  • Tier 1: stable summary metrics + composites (speed, variability, accuracy)
  • Tier 2: trial-level + computational parameters (e.g., diffusion model where appropriate)
  • Norms: if open normative datasets are not available for your exact tasks, use:
  • internal age-banded norms (8–10, 11–12, 13–15, 16–18) and report standardized scores within band.

A concrete example workflow

  1. Tier 1 (25 min) for all participants.
  2. Automatically assign Tier 2 modules based on profile:
  3. If attention/inhibition low or QC suggests lapses → Module 1 + Module 2
  4. If WM low → Module 3
  5. If memory low → Module 4
  6. If adolescent mental health or social outcomes are central → Module 6
  7. If mechanistic change/intervention → add Tier 3 longitudinal micro-assessments
  8. Split Tier 2 across two sessions (e.g., 45–60 min each) to reduce fatigue.

Age-specific notes (8–12 vs 13–18)

  • 8–12: prioritize simpler instructions, shorter blocks, more practice with feedback, fewer condition switches; interpret language tasks cautiously.
  • 13–18: you can increase difficulty (more trials, more lures, probabilistic learning), and computational modeling becomes more stable.

Implemented Tasks Reference

The following tasks are currently implemented in the Metricis platform.

Tier 1 Tasks (25 tasks)

Task Domain Duration Key Metrics
Simple RT Processing Speed 2 min mean_rt, rt_variability
Choice RT Processing Speed 3 min mean_rt, accuracy
CPT (Go/No-Go) Attention 5 min d_prime, commission_errors, omission_errors
Flanker Executive Function 4 min congruency_effect, accuracy
Stroop Executive Function 4 min interference_effect, accuracy
N-Back Working Memory 5 min d_prime, accuracy_by_level
Digit Span Working Memory 4 min forward_span, backward_span
Corsi Blocks Working Memory 4 min spatial_span
Digit Symbol Processing Speed 3 min symbols_correct, processing_speed
Trail Making Executive Function 4 min completion_time_a, completion_time_b
Verbal PA Memory 5 min immediate_recall, delayed_recall
Visual PA Memory 5 min immediate_recall, delayed_recall
MOT Attention 4 min tracking_accuracy
Letter-Number Switching Executive Function 4 min switch_cost, mixing_cost
Emotion Recognition Social Cognition 4 min accuracy_by_emotion
Matrix Reasoning Reasoning 6 min total_correct, accuracy
Vocabulary Reasoning 5 min total_correct
Delay Discounting Decision Making 4 min k_value, auc
Divided Attention Attention 5 min dual_task_cost
WCST Executive Function 6 min categories_completed, perseverative_errors
Probabilistic Reversal Decision Making 5 min learning_rate, reversal_accuracy
ANT Attention 6 min alerting_effect, orienting_effect, conflict_effect
Picture Sequence Memory 5 min sequence_accuracy
MFIS Questionnaire 3 min total_score, subscale_scores
PROMIS Questionnaire 5 min domain_t_scores

Tier 2 Deep Phenotyping Tasks (8 tasks)

Task Domain Duration Key Metrics Module
Vigilance CPT Attention 18 min vigilance_decrement, block_d_primes, lapse_rate Module 1
Operation Span Working Memory 15 min partial_load_score, absolute_span, math_accuracy Module 3
Tower Task Executive Function 12 min problems_solved, excess_moves, first_move_time Module 2
Source Memory Memory 15 min item_d_prime, source_accuracy, context_bias Module 4
Pattern Separation Memory 18 min pattern_separation_score, lure_discrimination_index Module 4
Theory of Mind Social Cognition 12 min first_order_accuracy, second_order_accuracy Module 6
Semantic Fluency Language 8 min total_words, cluster_size, num_switches Language
Sentence Comprehension Language 10 min complexity_effect, accuracy_simple, accuracy_complex Language

Computational Modeling Utilities

All tasks with RT data support additional computational metrics via the computational-modeling.ts utility:

Analysis Metrics Applicable Tasks
RT Distribution mean, median, sd, cv, skewness, tau All RT-based tasks
Post-Error Dynamics post_error_slowing, post_error_accuracy Go/No-Go, Flanker, Stroop
EZ-Diffusion DDM drift_rate, boundary, non_decision_time Choice RT, Flanker, Stroop
Speed-Accuracy Tradeoff sat_index, efficiency All accuracy + RT tasks

If you want, I can turn this into a domain-by-domain test menu with (i) specific open-source task implementations, (ii) recommended parameter settings (trial counts, timings) for 8–12 and 13–18, and (iii) a suggested data dictionary and QC rules suitable for an SAP-style appendix.