Design lessons learned from existing research
Cognitive decline is one of those conditions where the timing of detection matters. Catching it earlier means more time for treatment, planning, and lifestyle change. The standard approach involves the Mini-Mental State Examination or the Montreal Cognitive Assessment. Both are short clinician-administered tests, around 10 minutes long, that walk patients through tasks like recalling a list of words, drawing a clock, and naming objects. But many people who could benefit fail to get screened on time, either because of limited access or simply because they don’t want the trouble.
To address the gap, several self-administered assessment products have been developed. They can’t replace the standard test, but they can serve as supplementary data, or as an early warning sign that prompts someone to seek professional help. Some are gamified, because as Michael Hornberger from the University of East Anglia put it: “Very often, people don’t want to participate in scientific tests because they’re boring and dull. We realized that, if we could make something fun but still scientifically valid, that would be a fantastic way to engage lots of people.”
How does this work?
These tools each pick a cognitive domain that neuropsychology has shown to be sensitive to early disease (spatial navigation, processing speed, working memory, executive function), and design tasks related to it. As the user performs the task, the program captures behavioral data such as reaction times per trial, accuracy and error patterns, movement traces, hesitations, and how performance changes over time. Those measurements are then compared against models of typical performance, derived either from large normative samples or from machine-learning classifiers trained on data from healthy and clinical populations.
Design lessons from existing research
Some of these tools are backed by large-scale clinical validation and are already being integrated into care pathways, while others are at earlier validation stages, tested in smaller research samples. It’s still worth looking across the full range to see what these apps are doing well, where they fall short, and what design principles can be helpful for anyone building in this area.
Cross-cultural fairness
Good digital assessments should be culture-fair. They avoid words, idioms, currency symbols, and culturally specific imagery. Instead, they keep visual material to broadly familiar categories like animals, faces, and natural scenes.
ICA (Cognetivity’s Integrated Cognitive Assessment) is a good example. ICA is a five-minute iPad test in which images flash for a fraction of a second and the user classifies each one as containing an animal or not. It’s not really a memory test. It measures information processing speed (IPS), which is how quickly the brain’s visual system can take in a complex scene, extract meaning from it, and produce a motor response. Slowing of information processing speed is considered one of the earlier cognitive changes associated with Alzheimer’s disease.
Animal vs. non-animal is the strongest categorical division represented in the human higher-level visual cortex. Cognetivity calls this the “food or fear” factor: humans evolved to detect animals in cluttered natural scenes faster than almost any other category, because the consequences of missing one were severe. The evolutionary universality is what makes the test culturally fair, regardless of a user’s reading level, education, or familiarity with any particular culture.

Design it to be approachable
Older adults usually approach technology with some anxiety stemming from unfamiliarity and a fear of making errors. There’s another layer here: test-related anxiety has been shown to predict decreased performance on measures of executive functioning in older adults, independent of any underlying cognitive impairment.
A score on a cognitive test should reflect cognition, not anything else. If someone gives up on a navigation test because they couldn’t figure out the controls, or scores poorly on an animal categorization test because they were anxious, the data is contaminated. Thus, a lot of design effort goes into removing these non-cognitive failure modes so the experience feels approachable, forgiving, and emotionally safe.
Some apps lower friction by integrating real-life tasks. The Virtual Supermarket tests memory and planning by having users shop from a list in a virtual store on a tablet. A gamified N-back app from Murata and colleagues tests working memory by having players run a restaurant where they cook curries with hidden ingredients to fulfill customer orders. While these designs can make the experience more engaging, performance can also be shaped by users’ real-world experience with the activity. Shopping might feel familiar and comfortable for someone who does grocery runs every day, but not for someone who hasn’t shopped for themselves in years.

Signal richness
The behavioral fingerprint is often more useful than the final score. Sea Hero Quest is one of the best examples. Sea Hero Quest is a mobile game in which players steer a small boat through coastal mazes. The game is based on a specific neuroscience finding: the brain regions earliest affected by Alzheimer’s disease overlap heavily with the spatial navigation network. So how a player moves through the game’s maze can serve as an early warning sign.
Sea Hero Quest doesn’t just record whether users reach the end; it also records the entire path geometry, capturing hesitations, wrong turns, and the angles of corrections. These details carry useful diagnostic information. Lim and colleagues showed in a 2023 preprint (not yet peer-reviewed) that specific geometric features of a player’s path can distinguish people at genetic risk for Alzheimer’s from healthy controls before any clinical symptoms appear. In other words, two players might both reach the goal at the same time, but the shape of how they got there can tell them apart.

Limitations & design challenges
Digital cognitive assessment apps can be helpful and convenient, but there are still concerns worth taking seriously.
Validation gaps
The tools mentioned in this article are backed up by peer-reviewed studies on real clinical populations. Even validated apps, however, can produce false positives and false negatives. Many apps in the consumer market haven’t even been validated at all. False negatives can make users feel reassured enough to ignore real symptoms or postpone a clinical visit. False positives are also problematic. A 2020 scoping review of self-administered cognitive apps warned that poorly validated tools risk false-positive results, which can produce harms like unnecessary anxiety, costly follow-ups, and financial strain. Many consumer apps deliver the score and leave the user to figure out next steps alone. One JMIR review flagged this as a design failure, recommending tighter integration between screening and the care pathway.
Privacy
Cognitive performance data is health information, and the boundary of what counts as “cognitive data” keeps expanding. Many ordinary, seemingly meaningless signals can now be read as health information (typing rhythm, navigation patterns, swipe dynamics, etc.). That might have implications for insurance, employment, financial autonomy, and family dynamics, which is why it needs stricter regulation.
Hardware variance
A new iPhone might produce different results from an 8-year-old one. Glare, lighting, screen size, and case type can all affect the data. Reviews of digital cognitive assessments flag hardware variability and uncontrolled test environments as unsolved problems. An analysis of “bring your own device” cognitive testing found that latency contributed by different phones could add up to roughly 100 ms of variation in response times.
Learning effect & prior experience
Many cognitive tests get easier with practice, which means the test result becomes meaningless after several rounds. The Virtual Supermarket flags this as one of its limitations. Other tools have designed around it: ICA’s animal categorization shows no significant learning effect over repeated sessions. Sea Hero Quest uses a large set of distinct, pre-designed levels of varying types (wayfinding, chase, path integration), which reduces the chance of players repeating the same maze and helps limit practice effects.
Because prior smartphone or tablet experience can also affect results, calibration trials and per-user baselines should be integrated rather than relying on population averages alone.
Transparency
The way the information is presented also needs careful design. For users, this means knowing what data is being recorded and what conclusions might be drawn from it. This is important because the data captured can be far richer than users realize.
For clinicians, a bare probability score can be hard to act on and explain to a patient. What clinicians need is a model that exposes its reasoning so they can factor in their own judgment. There’s a growing argument that AI used for diagnostic decisions in healthcare should explain the factors driving its output, and that black-box models can’t responsibly be used in clinical practice.
So why bother developing all these new tools despite the concerns? From the results, we can see how they improve the care flow and the size of the collected dataset.
In a 2023 study, Modarres and colleagues evaluated the ICA as a prescreening tool for GP referrals to UK memory clinics. They found that a portion of GP referrals turned out to be patients without cognitive impairment, and that the ICA was able to correctly flag most of these non-impaired cases while still reliably detecting those with true impairment. This suggests the ICA could meaningfully reduce inappropriate specialist referrals if adopted in primary care.
Sea Hero Quest has gathered navigation data from millions of players worldwide, which is equivalent to thousands of years of traditional lab-based research. With a dataset that size, researchers can control for age, gender, country, and even rural vs. urban upbringing. Notably, Coutrot and colleagues found that people who grew up in cities with more grid-like street layouts (such as Chicago) tended to perform worse on Sea Hero Quest’s organic, non-grid environments, while those raised in rural areas or in cities with more complex, winding street networks performed better overall. These kinds of conclusions can only be drawn from a vast amount of data.
This is an area with real and direct impact on users’ quality of life, which is why getting the design principles right matters.
References:
Alzheimer’s Research UK. (n.d.). Sea Hero Quest. https://www.alzheimersresearchuk.org/research/for-researchers/resources-and-information/sea-hero-quest/
Charalambous, A. P., Pye, A., Yeung, W. K., Leroi, I., Neil, M., Thodi, C., & Dawes, P. (2020). Tools for app- and web-based self-testing of cognitive impairment: Systematic search and evaluation. Journal of Medical Internet Research, 22(1), e14551. https://doi.org/10.2196/14551
Coutrot, A., Manley, E., Goodroe, S., Gahnstrom, C., Filomena, G., Yesiltepe, D., Dalton, R. C., Wiener, J. M., Hölscher, C., Hornberger, M., & Spiers, H. J. (2022). Entropy of city street networks linked to future spatial navigation ability. Nature, 604(7904), 104–110. https://doi.org/10.1038/s41586-022-04486-7
Dawood, S. (2019, May 14). Sea Hero Quest: How a video game is helping to diagnose dementia. Design Week. https://www.designweek.co.uk/issues/13-19-may-2019/sea-hero-quest-game-dementia/
Dorenkamp, M. A., Irrgang, M., & Vik, P. (2023). Assessment-related anxiety among older adults: Associations with neuropsychological test performance. Aging, Neuropsychology, and Cognition, 30(2), 256–271. https://doi.org/10.1080/13825585.2021.2016584
Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198. https://doi.org/10.1016/0022-3956(75)90026-6
Kalafatis, C., Modarres, M. H., Apostolou, P., Marefat, H., Khanbagi, M., Karimi, H., Vahabi, Z., Aarsland, D., & Khaligh-Razavi, S.-M. (2021). Validity and cultural generalisability of a 5-minute AI-based, computerised cognitive assessment in mild cognitive impairment and Alzheimer’s dementia. Frontiers in Psychiatry, 12, 706695. https://doi.org/10.3389/fpsyt.2021.706695
Khaligh-Razavi, S.-M., Habibi, S., Sadeghi, M., Marefat, H., Khanbagi, M., Nabavi, S. M., Sadeghi, E., & Kalafatis, C. (2019). Integrated cognitive assessment: Speed and accuracy of visual processing as a reliable proxy to cognitive performance. Scientific Reports, 9, 1102. https://doi.org/10.1038/s41598-018-37709-x
Lim, U., Leal Cervantes, R., Coughlan, G., Lambiotte, R., Spiers, H. J., Hornberger, M., & Harrington, H. A. (2023). Geometry of navigation identifies genetic-risk and clinical Alzheimer’s disease [Preprint]. medRxiv. https://doi.org/10.1101/2023.10.01.23296035
Modarres, M. H., Kalafatis, C., Apostolou, P., Tabet, N., & Khaligh-Razavi, S.-M. (2023). The use of the Integrated Cognitive Assessment to improve the efficiency of primary care referrals to memory services in the Accelerating Dementia Pathway Technologies study. Frontiers in Aging Neuroscience, 15, 1243316. https://doi.org/10.3389/fnagi.2023.1243316
Murata, N., Nishii, S., Usuha, R., Kodaka, A., Fujimori, M., Sugawara, H., Kiriyama, T., Uchikado, H., Okumura, Y., & Takebe, T. (2025). A gamified N-back app for identifying mild-cognitive impairment in older adults. JMA Journal, 8(1), 174–182. https://doi.org/10.31662/jmaj.2024-0217
Nasreddine, Z. S., Phillips, N. A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I., Cummings, J. L., & Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53(4), 695–699. https://doi.org/10.1111/j.1532-5415.2005.53221.x
Nicosia, J., Wang, B., Aschenbrenner, A. J., Sliwinski, M. J., Yabiku, S. T., Roque, N. A., Germine, L. T., Bateman, R. J., Morris, J. C., & Hassenstab, J. (2023). To BYOD or not: Are device latencies important for bring-your-own-device (BYOD) smartphone cognitive testing? Behavior Research Methods, 55(6), 2800–2812. https://doi.org/10.3758/s13428-022-01925-1
Pellas, A. (2025, April 13). A Sea Hero Quest to understand our navigation skills. cs4fn. https://cs4fn.blog/2025/04/13/a-sea-hero-quest-to-understand-our-navigation-skills/
Taylor, P. (2021, October 22). FDA clears AI-powered digital test for early dementia. pharmaphorum. https://pharmaphorum.com/news/fda-clears-ai-powered-digital-test-for-early-dementia
Yan, M., Yin, H., Meng, Q., Wang, S., Ding, Y., Li, G., Wang, C., & Chen, L. (2021). A virtual supermarket program for the screening of mild cognitive impairment in older adults: Diagnostic accuracy study. JMIR Serious Games, 9(4), e30919. https://doi.org/10.2196/30919
How mobile apps are reshaping screening for cognitive decline was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.