Alzheimer's Dementia Recognition through Spontaneous Speech
The ADReSSo Challenge


  • NEW! The deadline for submission of predictions for the test set for scoring has been extended to 31st March 2021 (AOE)! (29-3-21)
  • NEW! Challenge description and Baseline paper now available! (24-3-21)
  • The results submission procedure amended; you are no longer required to submit your model. We will be distributing the test set instead. (3-3-21)
  • Correction: performance metrics for task 3 added. (26-2-21)
  • The training set is now available! Please email for instructions on how to download it. (20-2-21)
  • ADReSSo Challenge announced! (18-1-21)

Dementia is a category of neurodegenerative diseases that entails a long-term and usually gradual decrease of cognitive functioning. The main risk factor for dementia is age, and therefore its greatest incidence is amongst the elderly. Due to the severity of the situation worldwide, institutions and researchers are investing considerably on dementia prevention and early detection, focusing on disease progression. There is a need for cost-effective and scalable methods for detection of dementia from its most subtle forms, such as the preclinical stage of Subjective Memory Loss (SML), to more severe conditions like Mild Cognitive Impairment (MCI) and Alzheimer's Dementia (AD) itself.

The main features of the ADReSSo (ADReSS, speech only) Challenge are:

  • The Challenge targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). The challenge builds on the success of the ADReSS Challenge (Luz et Al, 2020), the first such shared-task event focused on AD, which attracted 34 teams from across the world.
  • The ADReSSo Challenge will provide a forum for those different research groups to test their existing methods (or develop novel approaches) on a new shared standardized dataset. The approaches that performed best on the original ADReSS dataset employed features extracted from manual transcripts, which were provided. The ADReSSo challenge provides a more challenging and improved spontaneous speech dataset, and requires the creation of models straight from speech, without manual transcription, although automatic transcription is allowed and encouraged.
  • In keeping with the objectives of AD prediction evaluation, the ADReSSo challenge's dataset will be statistically balanced so as to mitigate common biases often overlooked in evaluations of AD detection methods, including repeated occurrences of speech from the same participant (common in longitudinal datasets), variations in audio quality, and imbalances of gender and age distribution.
  • This task focuses on AD recognition using spontaneous speech, which marks a departure from neuropsychological and clinical evaluation approaches. Spontaneous speech analysis has the potential to enable novel applications for speech technology in longitudinal, unobtrusive monitoring of cognitive health, in line with the theme of this year's INTERSPEECH, ``Speech Everywhere!''.
The organizers have recently conducted an extensive systematic review of the scientific literature on speech and language processing AI methods for detection of cognitive decline in AD (de La Fuente, Ritchie and Luz, 2020) which you may consult for an overview of approaches and results in this field.

How to participate

The ADReSSo challenge consists of the following tasks:

  1. an AD classification task, where you are required to produce a model to predict the label (AD or non-AD) for a speech session. You may use the speech signal directly (acoustic features), or attempt to convert the speech into text automatically (ASR) and extract linguistic features from this automatically generated transcript;
  2. an MMSE score regression task, where you create a model to infer the subject's Mini Mental Status Examination (MMSE) score based on speech data;
  3. and a cognitive decline (disease progression) inference task, where you create a model to predict changes in cognitive status over time, for a given speaker, based on speech data collected at baseline (beginning of a cohort study).

You may choose to do one or more of these tasks. You will be provided with access to a training set (see relevant section below), and two weeks prior to the paper submission deadline you will be provided with test sets on which you can test your models.

You may send up to five sets of results to us for scoring. You are required to submit all your attempts together, in separate files named: test_results_task1.txt, test_results_task2.txt, and test_results_task3.txt (or a subset, if you choose not to enter all tasks). These must contains the ids of the test files and your model's predictions. If your approach employs automatic speech recognition (ASR), you will also need to submit the ASR output (transcription) for each audio file. These transcripts should be placed in separate folders (named 'asr_task1', 'asr_task2' and 'asr_task3') and named as the original files but with extension .txt. Please see the README files in the test set archives for further details.

Please note that due to numerous requests we have changed the results submission procedure. We no longer require you to submit your model. However, we strongly encourage you to share your model and code through a publicly accessible repository, and if possible use a literate programming "notebook" environment such as R Markdown or Jupyter Notebook.

Please do not use any other data from DementiaBank to train your models (e.g. for augmentation), as the task dataset may contain files from that repository.

You will also be expected to submit a paper to INTERSPEECH 2021, describing your approach and results. If your paper is accepted, it will be presented at the conference in the ADReSSo special session.

Access to the data set

In order to gain access to the dataset, you will need to become a member of DementiaBank (free of charge) by contacting Brian MacWhinney on this email. You should include your contact information and affiliation, as well as a general statement on how you plan to use the data, with specific mention to the ADReSS challenge. If you are a student, please ask your supervisor to join as a member to supervise your work. This membership will give you full access to the ADReSSo dataset.

Once you have become a member of DementiaBank, please email us at for futher instructions.

The data set

The DementiaBank directory to which you will gain access will contain only the training data for this Challenge. The data are organised in the following directory hierarchy:
   ├── diagnosis
   │   └── train
   │       ├── audio
   │           ├── ad
   │           └── cn
   └── progression
       └── train
           ├── audio
               ├── decline
               └── no_decline 
where diagnosis/train/audio/ad/ contains speech from speakers with an Alzheimer's dementia diagnosis, diagnosis/train/audio/cn/ contains speech produced by controls, progression/train/audio/decline/ contains baseline speech of patients who exhibited cognitive decline between their baseline assessment and they year two visit to the clinic, and progression/train/audio/no_decline/ contains speech from patients without cognitive decline in the same period. Decline is defined as a difference in MMSE score between baseline and year-2 greater than or equal 5 points.

For the AD/CN classification task and the MMSE predication task, each sub-directory contains compressed (ZIP) archives with recordings of a picture description task ("Cookie Theft" picture from the Boston Diagnostic Aphasia exam). Those recodings have been acoustically enhanced (noise reduction through spectral subtraction) and normalised. The directory structure and files for the disease progression prediction task are similarly organised. They consists of recordings of a laguage fluency exam, also normalised and acoustically enhanced.

The diagnosis task dataset has been balanced with respect to age and gender in order to eliminate potential confunding and bias. We employed a propensity score approach to matching (Rosenbaum & Rubin, 1983; Rubin 1973; Ho et al. 2007). The dataset was checked for matching according to scores defined in terms of the probability of an instance being treated as AD given covariates age and gender, estimated through logistic regression, and matching instances were selected. All standardized mean differences for the covariates were well below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were well below 0.15, indicating adequate balance for the covariates. The propensity score was estimated using a probit regression of the treatment on the covariates `age` and `gender` (probit generated a better balanced than logistic regression).

The figure below shows the respective (empirical) quantile-quantile (qq) plots for the original and balanced datasets. As usual, a qq plot showing instances near the diagonal indicates good balance.


Performance Metrics

Task 1 (AD classification) will be evaluated through the following metrics: \[ \displaystyle \operatorname {Accuracy} = {\frac { TN + TP }{N} } \], \[ \displaystyle \operatorname{Specificity} = { \frac { TN }{TN + FP} }, \] and \[ \displaystyle \operatorname {F_1} = { 2 \frac { \pi \times \rho }{\pi + \rho} } \] where \[ \displaystyle \operatorname {\pi} = { \frac { TP }{TP + FP} }, \] \[ \displaystyle \operatorname {\rho} = { \frac { TP }{TP + FN} }, \] N is the number of patients, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN the number of false negatives.

Task 2 (MMSE prediction) will be evaluated using the root mean squared error: \[ \displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i})^{2}}{N}}}. \] where $\hat{y}$ is the predicted MMSE score amd $y$ is the patient's actual MMSE score.

Task 3 (prediction of decline) will be evaluated through the ${F_1}$ (overall), with $\operatorname{Specificity}$ and $ \operatorname {\rho}$ (sensitivity) also reported.

When more than one attempt is submitted for scoring against the test set, all results will be considered (not only the best result overall), and should be reported in the paper.

ADReSSo Paper and Baseline Classification and regression Results

A full description of the ADReSSo Challenge and its datasets, along with a basic set of baseline results can be found in the paper below. Papers submitted to this Challenge using the ADReSS dataset should cite it as follows:

  1. S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney. Detecting cognitive decline using speech only: The ADReSSo Challenge. 2021. medRxiv 2021.03.24.21254263; doi: 10.1101/2021.03.24.21254263
    bib | medRxiv | PDF]

We also encourage you to share your approaches through e-prints and code repositories.

The ground truth for the test sets is now available through this link. Please contact us for the password.

Important Dates

  • January 18, 2021: ADReSSo Challenged announced.
  • March 26, 2021: Paper submission deadline.
  • March 29 NEW! March 31, 2021 (AOE): Results submission deadline.
    (NB: this has been set for after the paper submission deadline as INTERSPEECH allows the paper submission to be update until April 2)
  • April 2, 2021 Paper update deadline
  • June 2, 2021: Paper acceptance/rejection notification
  • August 31 - September 3, 2021: INTERSPEECH 2021.
See other important dates on the INTERSPEECH 2021 website.

Paper Submission

Please format your paper following the INTERSPEECH 2021 guidelines, and submit it indicating that it is meant for the ADReSSo Challenge.

Papers submitted to this Challenge should refer to the task results baseline paper (see reference abov for citation).


Saturnino Luz is a Reader at the Usher Institute, University of Edinburgh's Medical School. He works in medical informatics, devising and applying machine learning, signal processing and natural language processing methods in the study of behaviour and communication in healthcare contexts. His main research interest is the computational modelling of behavioural and biological changes caused by neurodegenerative diseases, with focus on the analysis of vocal and linguistic signals in Alzheimers's disease.
Fasih Haider is a Research Fellow at Usher Institute, University of Edinburgh's Medical School, UK. His areas of interest are Social Signal Processing and Artificial Intelligence. Before joining the Usher Institute, he was a Research Engineer at the ADAPT Centre where he worked on methods of Social Signal Processing for video intelligence. He holds a PhD in Computer Science from Trinity College Dublin, Ireland. Currently, he is investigating the use of social signal processing and machine learning for monitoring cognitive health.
Sofia de la Fuente is a Research Fellow at Usher Institute, University of Edinburgh’s Medical School and an Associate Fellow of the Higher Education Academy, UK. She graduated in Psychology (BSc Hons) at the Universidad Complutense de Madrid in 2015 and in Methodology for Behavioural and Health Sciences (MSc Hons) by the Universidad Autonoma de Madrid in 2017. She recently finished her PhD in Precision Medicine at the University of Edinburgh. Her research is an exploratory study of psycholinguistics, linguistics, paralinguistics and acoustic features that may help predict dementia onset later in life.
Davida Fromm is a Special Faculty member in the Psychology Department at Carnegie Mellon University. Her research interests have focused on aphasia, dementia, and apraxia of speech in adults. For the past 12 years, she has helped to develop a large shared database of multi-media discourse samples for a variety of neurogenic communication disorders. The database includes educational resources and research tools for an increasing number of automated language analyses.
Brian MacWhinney is Teresa Heinz Professor of Psychology, Computational Linguistics,and Modern Languages at Carnegie Mellon University. He received his Ph.D. in psycholinguistics in 1974 from the University of California at Berkeley. With Elizabeth Bates, he developed a model of first and second language processing and acquisition based on competition between item-based patterns. In 1984, he and Catherine Snow co-founded the CHILDES (Child Language Data Exchange System) Project for the computational study of child language transcript data. This system has extended to 13 additional research areas such aphasiology, second language learning, TBI, Conversation Analysis, developmental disfluency and others in the shape of the TalkBank Project. MacWhinney's recent work includes studies of online learning of second language vocabulary and grammar, situationally embedded second language learning, neural network modeling of lexical development, fMRI studies of children with focal brain lesions, and ERP studies of between-language competition. He is also exploring the role of grammatical constructions in the marking of perspective shifting, the determination of linguistic forms across contrasting time frames, and the construction of mental models in scientific reasoning. Recent edited books include The Handbook of Language Emergence (Wiley) and Competing Motivations in Grammar and Usage (Oxford).


The ADReSSo Challenge acknowledges the support and sponsorship of the European Union's Horizon 2020 research programme, under grant agreement No 769661, towards the SAAM project.


  1. de la Fuente Garcia S, Ritchie C, Luz S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer’s Disease: A Systematic Review. Journal of Alzheimer's Disease. 2020:1-27.
  2. Luz S, Haider F, de la Fuente S, Fromm D, MacWhinney B. Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge. Proceedings of INTERSPEECH 2020. Also available as arXiv preprint arXiv:2004.06833. 2020.
  3. Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55. .
  4. Rubin, Donald B. 1973. “Matching to Remove Bias in Observational Studies.” Biometrics 29 (1): 159. .
  5. Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15 (3): 199–236.

usher institute Centre for Dementia Prevention saam Supporting Active Ageing through Multimodal coaching