## Alzheimer's Dementia Recognition through Spontaneous Speech The ADReSSo Challenge

News:

• NEW! The deadline for submission of predictions for the test set for scoring has been extended to 31st March 2021 (AOE)! (29-3-21)
• NEW! Challenge description and Baseline paper now available! (24-3-21)
• The results submission procedure amended; you are no longer required to submit your model. We will be distributing the test set instead. (3-3-21)

Dementia is a category of neurodegenerative diseases that entails a long-term and usually gradual decrease of cognitive functioning. The main risk factor for dementia is age, and therefore its greatest incidence is amongst the elderly. Due to the severity of the situation worldwide, institutions and researchers are investing considerably on dementia prevention and early detection, focusing on disease progression. There is a need for cost-effective and scalable methods for detection of dementia from its most subtle forms, such as the preclinical stage of Subjective Memory Loss (SML), to more severe conditions like Mild Cognitive Impairment (MCI) and Alzheimer's Dementia (AD) itself.

• The Challenge targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). The challenge builds on the success of the ADReSS Challenge (Luz et Al, 2020), the first such shared-task event focused on AD, which attracted 34 teams from across the world.
• The ADReSSo Challenge will provide a forum for those different research groups to test their existing methods (or develop novel approaches) on a new shared standardized dataset. The approaches that performed best on the original ADReSS dataset employed features extracted from manual transcripts, which were provided. The ADReSSo challenge provides a more challenging and improved spontaneous speech dataset, and requires the creation of models straight from speech, without manual transcription, although automatic transcription is allowed and encouraged.
• In keeping with the objectives of AD prediction evaluation, the ADReSSo challenge's dataset will be statistically balanced so as to mitigate common biases often overlooked in evaluations of AD detection methods, including repeated occurrences of speech from the same participant (common in longitudinal datasets), variations in audio quality, and imbalances of gender and age distribution.
• This task focuses on AD recognition using spontaneous speech, which marks a departure from neuropsychological and clinical evaluation approaches. Spontaneous speech analysis has the potential to enable novel applications for speech technology in longitudinal, unobtrusive monitoring of cognitive health, in line with the theme of this year's INTERSPEECH, Speech Everywhere!''.
The organizers have recently conducted an extensive systematic review of the scientific literature on speech and language processing AI methods for detection of cognitive decline in AD (de La Fuente, Ritchie and Luz, 2020) which you may consult for an overview of approaches and results in this field.

## How to participate

1. an AD classification task, where you are required to produce a model to predict the label (AD or non-AD) for a speech session. You may use the speech signal directly (acoustic features), or attempt to convert the speech into text automatically (ASR) and extract linguistic features from this automatically generated transcript;
2. an MMSE score regression task, where you create a model to infer the subject's Mini Mental Status Examination (MMSE) score based on speech data;
3. and a cognitive decline (disease progression) inference task, where you create a model to predict changes in cognitive status over time, for a given speaker, based on speech data collected at baseline (beginning of a cohort study).

You may choose to do one or more of these tasks. You will be provided with access to a training set (see relevant section below), and two weeks prior to the paper submission deadline you will be provided with test sets on which you can test your models.

Please note that due to numerous requests we have changed the results submission procedure. We no longer require you to submit your model. However, we strongly encourage you to share your model and code through a publicly accessible repository, and if possible use a literate programming "notebook" environment such as R Markdown or Jupyter Notebook.

Please do not use any other data from DementiaBank to train your models (e.g. for augmentation), as the task dataset may contain files from that repository.

You will also be expected to submit a paper to INTERSPEECH 2021, describing your approach and results. If your paper is accepted, it will be presented at the conference in the ADReSSo special session.

Once you have become a member of DementiaBank, please email us at Fasih.Haider@ed.ac.uk for futher instructions.

## The data set

The DementiaBank directory to which you will gain access will contain only the training data for this Challenge. The data are organised in the following directory hierarchy:
   ├── diagnosis
│   └── train
│       ├── audio
│           └── cn
└── progression
└── train
├── audio
├── decline
└── no_decline

where diagnosis/train/audio/ad/ contains speech from speakers with an Alzheimer's dementia diagnosis, diagnosis/train/audio/cn/ contains speech produced by controls, progression/train/audio/decline/ contains baseline speech of patients who exhibited cognitive decline between their baseline assessment and they year two visit to the clinic, and progression/train/audio/no_decline/ contains speech from patients without cognitive decline in the same period. Decline is defined as a difference in MMSE score between baseline and year-2 greater than or equal 5 points.

For the AD/CN classification task and the MMSE predication task, each sub-directory contains compressed (ZIP) archives with recordings of a picture description task ("Cookie Theft" picture from the Boston Diagnostic Aphasia exam). Those recodings have been acoustically enhanced (noise reduction through spectral subtraction) and normalised. The directory structure and files for the disease progression prediction task are similarly organised. They consists of recordings of a laguage fluency exam, also normalised and acoustically enhanced.

The diagnosis task dataset has been balanced with respect to age and gender in order to eliminate potential confunding and bias. We employed a propensity score approach to matching (Rosenbaum & Rubin, 1983; Rubin 1973; Ho et al. 2007). The dataset was checked for matching according to scores defined in terms of the probability of an instance being treated as AD given covariates age and gender, estimated through logistic regression, and matching instances were selected. All standardized mean differences for the covariates were well below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were well below 0.15, indicating adequate balance for the covariates. The propensity score was estimated using a probit regression of the treatment on the covariates age and gender (probit generated a better balanced than logistic regression).

The figure below shows the respective (empirical) quantile-quantile (qq) plots for the original and balanced datasets. As usual, a qq plot showing instances near the diagonal indicates good balance.

## Performance Metrics

Task 1 (AD classification) will be evaluated through the following metrics: $\displaystyle \operatorname {Accuracy} = {\frac { TN + TP }{N} }$, $\displaystyle \operatorname{Specificity} = { \frac { TN }{TN + FP} },$ and $\displaystyle \operatorname {F_1} = { 2 \frac { \pi \times \rho }{\pi + \rho} }$ where $\displaystyle \operatorname {\pi} = { \frac { TP }{TP + FP} },$ $\displaystyle \operatorname {\rho} = { \frac { TP }{TP + FN} },$ N is the number of patients, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN the number of false negatives.

Task 2 (MMSE prediction) will be evaluated using the root mean squared error: $\displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i})^{2}}{N}}}.$ where $\hat{y}$ is the predicted MMSE score amd $y$ is the patient's actual MMSE score.

Task 3 (prediction of decline) will be evaluated through the ${F_1}$ (overall), with $\operatorname{Specificity}$ and $\operatorname {\rho}$ (sensitivity) also reported.

When more than one attempt is submitted for scoring against the test set, all results will be considered (not only the best result overall), and should be reported in the paper.

## ADReSSo Paper and Baseline Classification and regression Results

A full description of the ADReSSo Challenge and its datasets, along with a basic set of baseline results can be found in the paper below. Papers submitted to this Challenge using the ADReSS dataset should cite it as follows:

1. S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney. Detecting cognitive decline using speech only: The ADReSSo Challenge. 2021. medRxiv 2021.03.24.21254263; doi: 10.1101/2021.03.24.21254263
bib | medRxiv | PDF]

We also encourage you to share your approaches through e-prints and code repositories.

## Important Dates

• January 18, 2021: ADReSSo Challenged announced.
• March 26, 2021: Paper submission deadline.
• March 29 NEW! March 31, 2021 (AOE): Results submission deadline.
(NB: this has been set for after the paper submission deadline as INTERSPEECH allows the paper submission to be update until April 2)
• April 2, 2021 Paper update deadline
• June 2, 2021: Paper acceptance/rejection notification
• August 31 - September 3, 2021: INTERSPEECH 2021.
See other important dates on the INTERSPEECH 2021 website.

## Paper Submission

Please format your paper following the INTERSPEECH 2021 guidelines, and submit it indicating that it is meant for the ADReSSo Challenge.

Papers submitted to this Challenge should refer to the task results baseline paper (see reference abov for citation).