Tiered design

Below is a tiered strategy that is practical for remote, web-based deployment, is compatible with open-access / open-source tooling, and provides a clean pathway from broad screening to mechanistic deep phenotyping in ages 8–12 and 13–18.

I am assuming a research context where you want (i) scalable throughput, (ii) defensible data quality in unsupervised settings, and (iii) the option to “go deep” only when warranted.

Design principles¶

Two-pass strategy
Tier 1 (Screening): short, robust, low-burden tests that yield stable composites and flag domain-specific concerns.
Tier 2 (Deep phenotyping): longer, more granular tasks that decompose cognition into subcomponents and enable trial-level modeling (RT distributions, error types, learning curves).
Remote-first validity
Use tasks with ** simple instructions , ** high signal-to-noise , and ** built-in data-quality checks .**
Standardize device class (ideally laptop/desktop + keyboard) for RT tasks; treat mobile/touch as a separate mode.
Open-access stack
Implement tasks in jsPsych (browser-native) and/or adopt open batteries built on jsPsych (e.g., CAM) and other openly available web tasks (e.g., ANTI-style attention network tasks where available).
Store trial-level data to enable re-scoring, QC, and computational analyses.

Tier 0: Enrollment and readiness (3–5 minutes)¶

Purpose: reduce missingness, improve data quality, and standardize context.

Components

Device/environment checklist: quiet room, seated, no multitasking, headphones if auditory tasks.
Automated checks: screen size, browser, keyboard detection, audio output test.
Brief training module: “how to respond quickly and accurately,” practice trials with feedback.
Minimal metadata: age, grade, handedness, language, sleep last night (optional), ADHD meds today (optional).

Outputs

“Test readiness” flag + device mode label (desktop vs tablet vs phone).

Tier 1: Screening battery (20–30 minutes total)¶

Goal: broad coverage across all major domains with minimal fatigue.

A. Global ability / reasoning (5–7 min)¶

Rationale: provides an anchor for general cognitive ability and helps interpret specific deficits.

Fluid reasoning: short matrix reasoning (ICAR-style items or an open matrix set)
Optional: number series / simple analogies (age-appropriate)

Metrics: accuracy, time per item, total score; optional IRT if item bank supports it.

B. Processing speed (3–4 min)¶

Pattern/symbol comparison task (same/different discrimination; timed)

Metrics: correct per time, RT median, lapses.

C. Attention & inhibition (5–7 min)¶

Pick one of:

Flanker (selective attention + inhibition) or
Go/No-Go (response inhibition) with enough trials to get stable estimates.

Metrics: commission errors, omission errors, RT median, RT variability.

D. Working memory (4–6 min)¶

Pick one:

Digit span (forward/backward) adapted for browser (audio or visual presentation) or
1-back / 2-back (visual) with short blocks.

Metrics: accuracy, d′ (if applicable), RT.

E. Episodic memory (5–7 min)¶

Brief paired associates (picture–location, object–name, or image–image) or
Short recognition memory (encode 10–15 items → forced-choice recognition)

Metrics: hits/false alarms, d′, response bias.

F. Language (optional, 5–8 min; consider adding only if relevant)¶

Language is the hardest domain to do well remotely without licensing issues. If you include it:

Receptive vocabulary using an open word–picture mapping set (must be age-appropriate and validated within your cohort)
Reading can be approximated with timed word recognition, but dialect/education effects are large.

Metrics: accuracy, response time; interpret cautiously.

Tier 1 outputs and decision rules¶

Deliver (i) domain z-scores, (ii) a global composite, and (iii) QC flags.

Suggested triggers to Tier 2

Any domain score ≤ ~10th percentile of your study sample (or a normative reference if you have one).
High RT variability, high lapse rate, or inconsistent responding (suggests attention/engagement issues).
Profile patterns of interest (e.g., “WM low, inhibition high”).
Clinical or exposure subgroup of interest (e.g., concussion, epilepsy, migraine, sleep disorder).

Tier 2: Deep phenotyping modules (60–150 minutes, modular; can be split across 2–3 sessions)¶

Goal: disentangle constructs, enable mechanistic inference, and improve sensitivity to change.

Structure Tier 2 as domain modules you can assign based on Tier 1 results. Each module below is ~20–40 minutes.

Module 1: Attention networks + vigilance (25–35 min)¶

Attention network task (alerting, orienting, executive control)
Sustained attention/vigilance (e.g., CPT-style) with sufficient duration for lapses

Why this matters: separates “can’t focus” into orienting vs executive control vs arousal/vigilance failures.

Key metrics: RT distributions, lapse rate, time-on-task effects, post-error slowing.

Module 2: Executive function decomposition (25–40 min)¶

Task-switching (set shifting; switch cost)
Stroop or interference task (inhibition under conflict)
Optional: planning (Tower-style) if you can implement it reliably web-based

Key metrics: switch cost, interference cost, error types, speed–accuracy tradeoff.

Module 3: Working memory capacity and control (25–40 min)¶

Complex span (operation span / symmetry span—adapted carefully)
N-back with lures (to probe interference control)
Optional: updating paradigms

Key metrics: capacity estimates, interference susceptibility, RT variability, learning curves across blocks.

Module 4: Episodic memory systems (25–45 min)¶

Encoding + immediate + delayed recognition
Source memory (item + context)
Optional: associative memory with lure discrimination (pattern separation proxy)

Key metrics: d′, false alarm pattern, retention over delay, lure discrimination index.

Module 5: Learning and adaptation (20–35 min)¶

Probabilistic learning or reversal learning (feedback-driven adaptation)
Optional: simple reinforcement learning task

Key metrics: learning rate, perseveration, win–stay/lose–shift, exploration/exploitation tendency.

If your research questions require it (psychiatry, autism, adolescent socio-emotional development):

Emotion recognition (faces)
Theory of mind (short vignette inference)

Key metrics: accuracy by emotion type, response bias, RT.

Tier 3: Add-ons (as needed)¶

A. Ecological function and informant report (10–20 min)¶

Often essential for “cognitive status” interpretation:

Open questionnaires are limited; if you can’t use commercial scales, at minimum collect:
school function, attention concerns, sleep, mood, headaches, screen time, academic supports.

B. Longitudinal micro-assessment (5–10 min per timepoint)¶

For intervention studies or symptom fluctuations:

Use 1–2 short “digital biomarkers” with low practice effects:
processing speed + vigilance (brief) + simple WM
Repeat weekly/monthly; model within-subject change.

Recommended implementation architecture (remote web-based)¶

Core build¶

jsPsych for tasks + JATOS or a lightweight backend (or REDCap via external modules) for study flow and data capture.
Store trial-level events (stimulus, response, RT, correctness, condition labels).

Data quality safeguards (must-have)¶

Embedded attention checks (not trick questions; performance-consistency checks)
Minimum RT thresholds + RT outlier tagging (do not auto-delete; flag)
Practice blocks with criteria (e.g., must reach 70% before continuing)
Session interruption detection (tab-switch, idle time)
Post-session self-report: “Were you interrupted?”

Scoring strategy¶

Tier 1: stable summary metrics + composites (speed, variability, accuracy)
Tier 2: trial-level + computational parameters (e.g., diffusion model where appropriate)
Norms: if open normative datasets are not available for your exact tasks, use:
internal age-banded norms (8–10, 11–12, 13–15, 16–18) and report standardized scores within band.

A concrete example workflow¶

Tier 1 (25 min) for all participants.
Automatically assign Tier 2 modules based on profile:
If attention/inhibition low or QC suggests lapses → Module 1 + Module 2
If WM low → Module 3
If memory low → Module 4
If adolescent mental health or social outcomes are central → Module 6
If mechanistic change/intervention → add Tier 3 longitudinal micro-assessments
Split Tier 2 across two sessions (e.g., 45–60 min each) to reduce fatigue.

Age-specific notes (8–12 vs 13–18)¶

8–12: prioritize simpler instructions, shorter blocks, more practice with feedback, fewer condition switches; interpret language tasks cautiously.
13–18: you can increase difficulty (more trials, more lures, probabilistic learning), and computational modeling becomes more stable.

Implemented Tasks Reference¶

The following tasks are currently implemented in the Metricis platform.

Tier 1 Tasks (25 tasks)¶

Task	Domain	Duration	Key Metrics
Simple RT	Processing Speed	2 min	mean_rt, rt_variability
Choice RT	Processing Speed	3 min	mean_rt, accuracy
CPT (Go/No-Go)	Attention	5 min	d_prime, commission_errors, omission_errors
Flanker	Executive Function	4 min	congruency_effect, accuracy
Stroop	Executive Function	4 min	interference_effect, accuracy
N-Back	Working Memory	5 min	d_prime, accuracy_by_level
Digit Span	Working Memory	4 min	forward_span, backward_span
Corsi Blocks	Working Memory	4 min	spatial_span
Digit Symbol	Processing Speed	3 min	symbols_correct, processing_speed
Trail Making	Executive Function	4 min	completion_time_a, completion_time_b
Verbal PA	Memory	5 min	immediate_recall, delayed_recall
Visual PA	Memory	5 min	immediate_recall, delayed_recall
MOT	Attention	4 min	tracking_accuracy
Letter-Number Switching	Executive Function	4 min	switch_cost, mixing_cost
Emotion Recognition	Social Cognition	4 min	accuracy_by_emotion
Matrix Reasoning	Reasoning	6 min	total_correct, accuracy
Vocabulary	Reasoning	5 min	total_correct
Delay Discounting	Decision Making	4 min	k_value, auc
Divided Attention	Attention	5 min	dual_task_cost
WCST	Executive Function	6 min	categories_completed, perseverative_errors
Probabilistic Reversal	Decision Making	5 min	learning_rate, reversal_accuracy
ANT	Attention	6 min	alerting_effect, orienting_effect, conflict_effect
Picture Sequence	Memory	5 min	sequence_accuracy
MFIS	Questionnaire	3 min	total_score, subscale_scores
PROMIS	Questionnaire	5 min	domain_t_scores

Tier 2 Deep Phenotyping Tasks (8 tasks)¶

Task	Domain	Duration	Key Metrics	Module
Vigilance CPT	Attention	18 min	vigilance_decrement, block_d_primes, lapse_rate	Module 1
Operation Span	Working Memory	15 min	partial_load_score, absolute_span, math_accuracy	Module 3
Tower Task	Executive Function	12 min	problems_solved, excess_moves, first_move_time	Module 2
Source Memory	Memory	15 min	item_d_prime, source_accuracy, context_bias	Module 4
Pattern Separation	Memory	18 min	pattern_separation_score, lure_discrimination_index	Module 4
Theory of Mind	Social Cognition	12 min	first_order_accuracy, second_order_accuracy	Module 6
Semantic Fluency	Language	8 min	total_words, cluster_size, num_switches	Language
Sentence Comprehension	Language	10 min	complexity_effect, accuracy_simple, accuracy_complex	Language

Computational Modeling Utilities¶

All tasks with RT data support additional computational metrics via the computational-modeling.ts utility:

Analysis	Metrics	Applicable Tasks
RT Distribution	mean, median, sd, cv, skewness, tau	All RT-based tasks
Post-Error Dynamics	post_error_slowing, post_error_accuracy	Go/No-Go, Flanker, Stroop
EZ-Diffusion DDM	drift_rate, boundary, non_decision_time	Choice RT, Flanker, Stroop
Speed-Accuracy Tradeoff	sat_index, efficiency	All accuracy + RT tasks

If you want, I can turn this into a domain-by-domain test menu with (i) specific open-source task implementations, (ii) recommended parameter settings (trial counts, timings) for 8–12 and 13–18, and (iii) a suggested data dictionary and QC rules suitable for an SAP-style appendix.