Alzheimer's Dementia Recognition through Spontaneous Speech
The ADReSSo Challenge
News:
- NEW! The deadline for submission of
predictions for the test set for scoring has been extended to 31st March 2021 (AOE)! (29-3-21)
- NEW! Challenge description and Baseline
paper now available! (24-3-21)
- The results submission procedure
amended; you are no longer required to submit your model. We will be
distributing the test set instead.
(3-3-21)
- Correction: performance metrics
for task 3 added. (26-2-21)
- The training set is now available! Please
email ADReSS_is2020@ed.ac.uk
for instructions on how to download it. (20-2-21)
- ADReSSo Challenge
announced! (18-1-21)
Dementia is a category of neurodegenerative diseases that entails
a long-term and usually gradual decrease of cognitive functioning.
The main risk factor for dementia is age, and therefore its
greatest incidence is amongst the elderly. Due to the severity of
the situation worldwide, institutions and researchers are
investing considerably on dementia prevention and early detection,
focusing on disease progression. There is a need for
cost-effective and scalable methods for detection of dementia from
its most subtle forms, such as the preclinical stage of Subjective
Memory Loss (SML), to more severe conditions like Mild Cognitive
Impairment (MCI) and Alzheimer's Dementia (AD) itself.
The main features of the ADReSSo (ADReSS, speech only) Challenge are:
- The Challenge targets a difficult automatic prediction
problem of societal and medical relevance, namely, the
detection of Alzheimer's Dementia (AD). The challenge builds
on the success of the ADReSS Challenge
(Luz
et Al, 2020), the first such shared-task event focused on
AD, which attracted 34 teams from across the world.
- The
ADReSSo Challenge will provide a forum for those different
research groups to test their existing methods (or develop novel
approaches) on a new shared standardized dataset. The approaches
that performed best on the original ADReSS dataset employed
features extracted from manual transcripts, which were provided.
The ADReSSo challenge provides a more challenging and improved
spontaneous speech dataset, and requires the creation of models
straight from speech, without manual transcription, although
automatic transcription is allowed and encouraged.
- In keeping with the objectives of AD prediction evaluation,
the ADReSSo challenge's dataset will be statistically balanced so as to
mitigate common biases often overlooked in evaluations of AD
detection methods, including repeated occurrences of speech from
the same participant (common in longitudinal datasets),
variations in audio quality, and imbalances of gender and age
distribution.
- This task focuses on AD recognition using spontaneous speech,
which marks a departure from neuropsychological and clinical
evaluation approaches. Spontaneous speech analysis has the
potential to enable novel applications for speech technology in
longitudinal, unobtrusive monitoring of cognitive health, in line
with the theme of this year's INTERSPEECH, ``Speech
Everywhere!''.
The organizers have recently conducted an extensive systematic
review of the scientific literature on speech and language
processing AI methods for detection of cognitive decline in AD
(
de
La Fuente, Ritchie and Luz, 2020) which you may consult for an
overview of approaches and results in this field.
How to participate
The ADReSSo challenge consists of the following tasks:
- an AD classification task, where you are required to produce a
model to predict the label (AD or non-AD) for a speech session. You
may use the speech signal directly (acoustic features), or attempt
to convert the speech into text automatically (ASR) and extract
linguistic features from this automatically generated transcript;
- an MMSE score regression task, where you create a model to
infer the subject's Mini Mental Status Examination (MMSE) score
based on speech data;
- and a cognitive decline (disease progression) inference task,
where you create a model to predict changes in cognitive status over
time, for a given speaker, based on speech data collected at
baseline (beginning of a cohort study).
You may choose to do one or more of these tasks. You will be provided
with access to a training set (see relevant section below), and
two weeks prior to the paper submission deadline you will be provided
with test sets on which you can test your models.
You may send up to five sets of results to us for scoring. You are
required to submit all your attempts together, in
separate files named: test_results_task1.txt, test_results_task2.txt,
and test_results_task3.txt (or a subset, if you choose not to enter
all tasks). These must contains the ids of the test files and your
model's predictions. If your approach employs automatic speech
recognition (ASR), you will also need to submit the ASR output
(transcription) for each audio file. These transcripts should be
placed in separate folders (named 'asr_task1', 'asr_task2' and
'asr_task3') and named as the original files but with
extension .txt. Please see the README files in the test set
archives for further details.
Please note that due to numerous requests we have changed the
results submission procedure. We no longer require you to
submit your model. However, we strongly encourage you
to share your model and code through a publicly accessible
repository, and if possible
use a literate programming "notebook" environment
such as R Markdown
or Jupyter Notebook.
Please do not use any other data from DementiaBank to train your
models (e.g. for augmentation), as the task dataset may contain
files from that repository.
You will also be expected to
submit a paper to INTERSPEECH 2021, describing your approach and
results. If your paper is accepted, it will be presented at the
conference in the ADReSSo special session.
Access to the data set
In order to gain access to the dataset, you will need
to become a member of
DementiaBank (free of charge) by contacting Brian MacWhinney
on this email. You should include
your contact information and affiliation, as well as a general
statement on how you plan to use the data, with specific mention to
the ADReSS challenge. If you are a student, please ask your
supervisor to join as a member to supervise your work. This
membership will give you full access to the ADReSSo dataset.
Once you have become a member of DementiaBank, please email us
at Fasih.Haider@ed.ac.uk for futher
instructions.
The data set
The DementiaBank directory to which you will gain access will contain
only the
training data for this Challenge. The data
are organised in the following directory hierarchy:
├── diagnosis
│ └── train
│ ├── audio
│ ├── ad
│ └── cn
└── progression
└── train
├── audio
├── decline
└── no_decline
where
diagnosis/train/audio/ad/ contains speech from speakers
with an Alzheimer's dementia
diagnosis,
diagnosis/train/audio/cn/ contains speech produced
by controls,
progression/train/audio/decline/ contains baseline
speech of patients who exhibited cognitive decline between their
baseline assessment and they year two visit to the clinic,
and
progression/train/audio/no_decline/ contains speech from
patients without cognitive decline in the same period. Decline is
defined as a difference in MMSE score between baseline and year-2
greater than or equal 5 points.
For the AD/CN classification task and the MMSE predication task, each
sub-directory contains compressed (ZIP) archives with recordings of a
picture description task ("Cookie Theft" picture from the Boston
Diagnostic Aphasia exam). Those recodings have been acoustically
enhanced (noise reduction through spectral subtraction) and
normalised. The directory structure and files for the disease progression
prediction task are similarly organised. They consists of recordings
of a laguage fluency exam, also normalised and acoustically
enhanced.
The diagnosis task dataset has been balanced with respect to age and
gender in order to eliminate potential confunding and bias. We
employed a propensity score approach to matching (Rosenbaum & Rubin,
1983; Rubin 1973; Ho et al. 2007). The dataset was checked for
matching according to scores defined in terms of the probability of an
instance being treated as AD given covariates age and gender,
estimated through logistic regression, and matching instances were
selected. All standardized mean differences for the covariates were
well below 0.1 and all standardized mean differences for squares and
two-way interactions between covariates were well below 0.15,
indicating adequate balance for the covariates. The propensity score
was estimated using a probit regression of the treatment on the
covariates `age` and `gender` (probit generated a better balanced than
logistic regression).
The figure below shows the respective (empirical) quantile-quantile
(qq) plots for the original and balanced datasets. As usual, a qq plot
showing instances near the diagonal indicates good balance.
Performance Metrics
Task 1 (AD classification) will be evaluated through the following metrics:
\[ \displaystyle \operatorname {Accuracy} = {\frac { TN + TP }{N} } \],
\[ \displaystyle \operatorname{Specificity} = { \frac { TN }{TN + FP} }, \]
and
\[ \displaystyle \operatorname {F_1} = { 2 \frac { \pi \times \rho
}{\pi + \rho} } \]
where
\[ \displaystyle \operatorname {\pi} = { \frac { TP }{TP + FP} }, \]
\[ \displaystyle \operatorname {\rho} = { \frac { TP }{TP + FN} }, \]
N is the number of patients, TP is the number of true
positives, TN is the number of true negatives, FP is the number of
false positives and FN the number of false negatives.
Task 2 (MMSE prediction) will be evaluated using the root mean squared error:
\[ \displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i})^{2}}{N}}}. \]
where $\hat{y}$ is the predicted MMSE score amd $y$ is the patient's
actual MMSE score.
Task 3 (prediction of decline) will be evaluated through the
${F_1}$ (overall), with $\operatorname{Specificity}$ and $
\operatorname {\rho}$ (sensitivity) also reported.
When more than one attempt is submitted for scoring against the test
set, all results will be considered (not only the best result
overall), and should be reported in the paper.
ADReSSo Paper and Baseline Classification and
regression Results
A full description of the ADReSSo Challenge and its datasets, along
with a basic set of baseline results can be found in the paper
below. Papers submitted to this Challenge using the ADReSS dataset
should cite it as follows:
- S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney.
Detecting cognitive decline using speech only: The ADReSSo Challenge.
2021. medRxiv 2021.03.24.21254263; doi: 10.1101/2021.03.24.21254263
[ bib |
medRxiv |
PDF]
We also encourage you to share your approaches through e-prints and
code repositories.
The ground truth for the test sets is now available through
this
link.
Important Dates
- January 18, 2021: ADReSSo Challenged announced.
- March 26, 2021: Paper submission deadline.
-
March 29 NEW! March 31, 2021 (AOE): Results submission deadline.
(NB: this has been set for after the paper submission deadline as
INTERSPEECH allows the paper submission to be update until April 2)
- April 2, 2021 Paper update deadline
- June 2, 2021: Paper acceptance/rejection notification
- August 31 - September 3, 2021: INTERSPEECH 2021.
See
other
important dates on the INTERSPEECH 2021 website.
Paper Submission
Please format your paper following
the INTERSPEECH
2021 guidelines, and submit it indicating that it is meant for the ADReSSo Challenge.
Papers submitted to this Challenge should refer
to the task results baseline paper (see reference abov
for citation).
Organizers
Saturnino Luz is a Reader at the Usher Institute,
University of Edinburgh's Medical School. He works in medical
informatics, devising and applying machine learning, signal
processing and natural language processing methods in the study of
behaviour and communication in healthcare contexts. His main
research interest is the computational modelling of behavioural and
biological changes caused by neurodegenerative diseases, with focus
on the analysis of vocal and linguistic signals in Alzheimers's
disease.
|
Fasih Haider is a Research Fellow at
Usher Institute, University of Edinburgh's Medical School, UK. His areas of interest
are Social Signal Processing and Artificial Intelligence.
Before joining the Usher Institute, he was a Research Engineer at the ADAPT
Centre where he worked on methods of Social Signal Processing for video
intelligence. He holds a PhD in Computer Science from Trinity College
Dublin, Ireland. Currently, he is investigating the use of
social signal processing and machine learning for monitoring cognitive
health.
|
Sofia de
la Fuente is a Research Fellow at Usher Institute,
University of Edinburgh’s Medical School and an Associate Fellow
of the Higher Education Academy, UK. She graduated in Psychology
(BSc Hons) at the Universidad Complutense de Madrid in 2015 and
in Methodology for Behavioural and Health Sciences (MSc Hons) by
the Universidad Autonoma de Madrid in 2017. She recently
finished her PhD in Precision Medicine at the University of
Edinburgh. Her research is an exploratory study of
psycholinguistics, linguistics, paralinguistics and acoustic
features that may help predict dementia onset later in life.
|
Davida
Fromm is a Special Faculty member in the Psychology
Department at Carnegie Mellon University. Her research
interests have focused on aphasia, dementia, and apraxia of
speech in adults. For the past 12 years, she has helped to
develop a large shared database of multi-media discourse samples
for a variety of neurogenic communication disorders. The
database includes educational resources and research tools for
an increasing number of automated language analyses.
|
Brian MacWhinney is Teresa Heinz Professor
of Psychology, Computational Linguistics,and Modern Languages
at Carnegie Mellon University. He received his Ph.D. in
psycholinguistics in 1974 from the University of California at
Berkeley. With Elizabeth Bates, he developed a model of first
and second language processing and acquisition based on
competition between item-based patterns. In 1984, he and
Catherine Snow co-founded the CHILDES (Child Language Data
Exchange System) Project for the computational study of child
language transcript data. This system has extended to 13
additional research areas such aphasiology, second language
learning, TBI, Conversation Analysis, developmental disfluency
and others in the shape of the TalkBank Project. MacWhinney's
recent work includes studies of online learning of second
language vocabulary and grammar, situationally embedded second
language learning, neural network modeling of lexical
development, fMRI studies of children with focal brain lesions,
and ERP studies of between-language competition. He is also
exploring the role of grammatical constructions in the marking
of perspective shifting, the determination of linguistic forms
across contrasting time frames, and the construction of mental
models in scientific reasoning. Recent edited books include The
Handbook of Language Emergence (Wiley) and Competing Motivations
in Grammar and Usage (Oxford).
|
Acknowledgements
The ADReSSo Challenge acknowledges the support and sponsorship of the European Union's Horizon 2020 research
programme, under grant agreement No 769661, towards
the SAAM project.
References
- de la Fuente Garcia S, Ritchie C, Luz S. Artificial
Intelligence, Speech, and Language Processing Approaches to
Monitoring Alzheimer’s Disease: A Systematic Review. Journal of
Alzheimer's Disease. 2020:1-27.
- Luz S, Haider F, de la Fuente S, Fromm D, MacWhinney
B. Alzheimer's Dementia Recognition through Spontaneous Speech:
The ADReSS Challenge. Proceedings of INTERSPEECH 2020. Also
available as arXiv preprint arXiv:2004.06833. 2020.
-
Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The Central Role of
the Propensity Score in Observational Studies for Causal Effects.”
Biometrika 70 (1): 41–55. .
- Rubin, Donald B. 1973. “Matching to Remove Bias in Observational
Studies.” Biometrics 29 (1): 159. .
- Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth
A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing
Model Dependence in Parametric Causal Inference.” Political Analysis
15 (3): 199–236. https://doi.org/10.1093/pan/mpl013.