|
ICASSP 2023 SPGC Challenge: Multilingual Alzheimer's
Dementia Recognition through Spontaneous Speech
|

|
News:
- NEW! Groundtruth
for test set released. (29-3-23)
-
Ranking tables now available (12-2-23)
-
Instructions for preparing
your paper now available (9-2-23)
- Important!: you can request a
sample test format file
(1-2-23)
- Deadline for submission of results extended to
9 February (1-2-23).
- ADReSS-M test metadata (age, gender and education)
released. Click
here to download. (1-2-23).
- ADReSS-M test set
released. Click
here to download. (24-1-23)
- Challenge description and Baseline
paper now available! (16-1-23)
- Registration deadline extended to
13 January (6-1-23).
- Important!: Test set
Change; we will now use a test set of Greek spontaneous speech
picture descriptions. A sample set will be provided. If you have
already received the Spanish sample set, please ignore it and
request the replacement Greek
sample (6-1-23)
- Name Change: this SPGC is now
named ADReSS-M (6-1-23)
- The Multilingual Alzheimer's Dementia Recognition on Spontaneous
Speech, an ICASSP'23 Signal Processing Grand Challenge (SPGC) is
announced! (27-11-22)
The SPGC on Multilingual Alzheimer's Dementia Recognition through
Spontaneous Speech, at ICASSP 2023
The ADReSS-M Signal Processing Grand Challenge targets a difficult automatic
prediction problem of societal and medical relevance, namely, the
detection of Alzheimer's Dementia (AD). Dementia is a category of
neurodegenerative diseases that entails a long-term and usually
gradual decrease of cognitive functioning.
While there has been much interest in automated methods for
cognitive impairment detection by the signal processing and machine
learning communities (de
La Fuente, Ritchie and Luz, 2020), most
of the proposed approaches have not investigated which speech
features can be generalised and transferred across languages for
AD prediction, and to the best of our knowledge no work has
investigated acoustic features of the speech signal in multilingual AD
detection. The ADReSS-M Challenge targets this issue by
defining a prediction task whereby participants train their
models based on English speech data and assess their models'
performance on spoken Greek data. It is expected that the models
submitted to the challenge will focus on acoustic features of the
speech signal and discover features whose predictive power is
preserved across languages, but other approaches can be considered.
In keeping with the objectives of AD prediction evaluation, the
ADReSS-M challenge's dataset is statistically matched so as to mitigate
common biases often overlooked in evaluations of AD detection methods,
including repeated occurrences of speech from the same participant
(common in longitudinal datasets), variations in audio quality, and
imbalances of gender, age and educational level. By focusing on AD
recognition using spontaneous speech, we depart from
neuropsychological and clinical evaluation approaches, as spontaneous
speech analysis has the potential to enable novel applications for
speech technology in longitudinal, unobtrusive monitoring of cognitive
health.
This challenge aims to provide a platform for contributions and
discussions on applying signal processing and machine learning methods
for Multilingual Alzheimer's Dementia Recognition through Spontaneous
Speech. We invite the submission of papers describing system ideas
(machine learning architectures), novel signal processing features,
feature selection, and feature extraction methods in the context of
ADReSS-M Challenge.
The organizers have recently conducted an extensive systematic review
of the scientific literature on speech and language processing AI
methods for detection of cognitive decline in AD
(de
La Fuente, Ritchie and Luz, 2020) which might offer
participants a handy overview of approaches and results in this field.
How to participate
The ADReSS-M challenge consists of the following tasks:
- a classification task, where the model will aim to distinguish
healthy control speech from AD/MCI speech, and
- an MMSE score prediction (regression) task, where you create a model to
infer the subject's Mini Mental Status Examination (MMSE) score
based on speech data;
You may choose to do one or both of these tasks. You will be provided
with access to a training set (see relevant section below), and
two weeks prior to the paper submission deadline you will be provided
with test sets on which you can test your models.
You may send up to five sets of results to us for scoring for each
task. You are required to submit all your attempts together, in
separate files named: madress_results_task1_attempt1.txt,
madress_results_task2_attemp1.txt (or one of these, should you choose
not to enter both tasks). These must contains the IDs of the test
files and your model's predictions. You will be provided with README
files in the test sets archives with further details.
The test sets will contain README.md files with further details.
As the broader scientific goal of ADReSS-M is to gain insight into
the nature of the relationship between speech and cognitive function
across different languages, we encourage you to upload a paper describing your approaches
and results to a pre-print repository such
as arXiv or medRxiv
regardless of your ranking in
the Challenge, and to share your code through a publicly accessible
repository, if possible
using a literate programming "notebook" environment
such as R Markdown
or Jupyter Notebook.
The top 5 ranked teams will be invited to submit a 2-page paper
describing their approach and present it at ICASSP-2023 . Accepted
papers will be in the ICASSP proceedings. The teams that present
their work at ICASSP are also invited to submit a full paper about
their work to
to the IEEE Open Journal of Signal Processing (OJ-SP).
Access to the data set
In order to gain access to the ADReSS-M dataset, please email
madress2023@ed.ac.uk with your contact information and affiliation, as
well as a general statement on how you plan to use the data, with a
specific mention to the ADReSS-M challenge. If you are a student,
please ask your supervisor to join as well. This membership will give
you full access to the dataset through DementiaBank, where the ADReSS-M
dataset will be available.
Note: Please do not use any other data from DementiaBank to train your
models (e.g. for augmentation), as the task dataset may contain
files from that repository.
The data set
The training dataset consists of spontaneous speech samples corresponding to
audio recordings of picture descriptions produced by cognitively normal
subjects and patients with an AD diagnosis, who were asked to
describe the Cookie Theft picture from the Boston Diagnostic Aphasia
Examination test (Becker et al., 1994). The
participants were speakers of English. The test set consists of
spontaneous speech descriptions of a different picture, in Greek. The
recordings were made in one of these languages.
You will initially be allowed access only to the training
data (English) and some sample Greek data for this SPGC. The data
are organised in the following directory hierarchy:
MADReSS
├── sample-gr
├── sample-gr-groundtruth.csv
├── train
└── training-groundtruth.csv
Important: note that the sample and test data are now (as of
6-1-23) in Greek,
rather than Spanish. If you already downloaded the Spanish sample
data, please replace it by the sample-gr data provided. The test data
will be similar to these samples.
The groundtruth.csv
files contain the participant's
cognitive status (Control or ProbableAD, for cognitively normal and
patients diagnosed with probable Alzheimer's dementia), and the
results of their mini-mental state examination (MMSE) test, which is
widely used for cognitive accessment. The train directory
contains a training set of 237 picture description recordings in
English. The sample contains a small sample of 8 picture
descriptions in Greek. The test set will consist of 46 recordings,
all in Greek.
The ADReSS-M training dataset has been balanced with respect to age
and gender in order to eliminate potential confunding and bias. We
employed a propensity score approach to matching
(Rosenbaum & Rubin, 1983; Rubin
1973; Ho et al. 2007). The dataset was
checked for matching according to scores defined in terms of the
probability of an instance being treated as AD given covariates age
and gender estimated through logistic regression, and matching
instances were selected. All standardized mean differences for the
covariates were well below 0.1 and all standardized mean differences
for squares and two-way interactions between covariates were well
below 0.15, indicating adequate balance for the covariates. The
propensity score was estimated using a probit regression of the
treatment on the covariates age and gender (probit generated a better
balanced than logistic regression).
Evaluation and ranking
Task 1: AD classification will be evaluated through the
accuracy metric:
Specificity, sensitivity () and scores for the AD
class will also be reported on the ranked list to be published on
this web site. These metrics will be computed as follows:
and
where
N is the number of patients, TP is the number of true
positives, TN is the number of true negatives, FP is the number of
false positives and FN the number of false negatives. You will also
be asked to submit prediction probabilities, so that area under the
ROC curve scores can also be published.
Task 2 (MMSE prediction) will be evaluated using the
coefficient of determination:
and the root mean squared error:
where is the predicted MMSE score, is the patient's
actual MMSE score, and is the mean score.
When more than one attempt is submitted for scoring against the test
set, all results should be considered (not only the best result
overall) and reported in the paper.
The ranking of submissions will be done based on accuracy scores
for the classification task (task 1), and on RMSE scores for the
MMSE score regression task (task 2). The top 5 models will consist of:
- The two top performing (most accurate) teams for the
classification task
- The two top performing (least RMSE) teams for the
MMSE regression task
- The team that performed best on average for the two tasks,
chosen according to the following formula
, where is
the total score of team and is the total number of teams
in the challenge. If a team chooses not to submit results for a
task, its score for that task will be set to 0.
Ties will be broken by averaging performance over all attempts. The
criteria above will be applied so that the rank results in 5 different
teams. So, if one team is selected as a top team under one of the
criteria, it will not be selected as a top team in another. Should
such a case occur, the next top-performing team will be selected.
ADReSS-M SPGC Description Paper and Baseline Results
A paper describing this Signal Processing Grand Challenge and
its dataset more fully, along
with a basic set of baseline results will be posted
to https://arxiv.org/ shortly, and
linked to this web page. Papers submitted to this Challenge
using the MADReSS dataset should cite this paper as follows
- Luz S, Haider F, Fromm D, Lazarou I, Kompatsiaris I, MacWhinney
B. Multilingual Alzheimer’s Dementia Recognition through Spontaneous
Speech: a Signal Processing Grand Challenge. arXiv; 2023. DOI:
10.48550/arXiv.2301.05562,
available
from: https://arxiv.org/abs/2301.05562.
[BibTeX]
We encourage you to submit papers describing your approaches to the
tasks set here to https://arxiv.org/,
regardless of your ranking in the Challenge, and to share your code
through open-source repositories. Please note that the intellectual
property (IP) related to your submission is not transferred to the
challenge organizers, i.e., if code is shared/submitted, the
participants remain the owners of their code. When the code is made
publicly available, an appropriate license should be added.
- 27th November: ADReSS-M Challenge announced and Call for Participation
Published
- 13th January: registration deadline; please email
madress2023@ed.ac.uk to
register for the challenge and receive the training and sample sets.
- 6th February: deadline for submission of results
- 7th February: top-scoring models invited to submit 2-page paper
- 27th February: Grand Challenge 2-page Papers due.
- 4th-9th June: ICASSP 2023.
See
other important
dates on the ICASSP 2023 website.
Paper Submission
Papers invited for submission must be submitted through
the
ICASSP'23 submission platform. After logging into the system,
please create a new submission for "Grand Challenges", and then
tick the box labelled "Multilingual Alzheimer’s Dementia Recognition
through Spontaneous Speech (MADReSS)" under "SUBJECT AREAS".
Please follow these instructions when preparing your paper:
- Use the same template for regular ICASSP papers, but with a
page length of max 2 pages (instead of 4+1). This includes everything
(title, abstract, intro, references, and possible figures and tables).
- The abstract and introduction must clearly mention that this
work is done in the context of an “ICASSP Signal Processing Grand
Challenge“ ( +include (a) official challenge name, (b) the year of the
challenge, and if applicable (c) the edition number if this is not the
first edition of the challenge)
- The paper is free format, but must not exceed 2 pages
(including references), and should contain an abstract and (brief)
introduction.
- The introduction should at least contain:
- a brief description of the scope of the challenge (+ challenge name, see above)
- a brief description of your proposed solution (its main ingredients)
- the quantitative results you obtained on the challenge’s evaluation metrics.
It is not necessary to exhaustively describe prior art in the introduction. However, standard citation rules remain
applicable. E.g. if your solution is inspired by (or uses) existing
work, cite it properly. Papers submitted to this Challenge should refer
to the ADReSS-M Challenge descrition paper
(see reference above for citation).
- In the main text, focus on the conceptual implementation/innovation, and a high-level description of the
proposed solution.
- In general: make the paper as self-contained as possible. Yet, keep in mind that you still have the opportunity
to submit a full journal paper to OJSP if you feel your methodology is sufficiently innovative to be published as a
journal paper.
Organizers
Saturnino Luz
is a Reader at the Usher Institute, at the University of
Edinburgh's Medical School. He works on digital biomarkers and
precision medicine, devising and applying machine learning, signal
processing and natural language processing methods to the study of
behaviour and communication in healthcare contexts. His main
research interest is the computational modelling of behavioural and
biological changes caused by neurodegenerative diseases, with focus
on the analysis of vocal and linguistic signals in Alzheimers's
disease.
|
Fasih Haider is a Research Fellow at the
Centre for Medical Informatics, Usher Institute, University of Edinburgh, UK. His areas of interest
are Social Signal Processing and Artificial Intelligence.
Before joining the Usher Institute, he was a Research Engineer at the ADAPT
Centre where he worked on methods of Social Signal Processing for video
intelligence. He holds a PhD in Computer Science from Trinity College
Dublin, Ireland. Currently, he is investigating the use of
social signal processing and machine learning for monitoring cognitive
health.
|
Davida
Fromm is a Special Faculty member in the Psychology
Department at Carnegie Mellon University. Her research
interests have focused on aphasia, dementia, and apraxia of
speech in adults. For the past 12 years, she has helped to
develop a large shared database of multi-media discourse samples
for a variety of neurogenic communication disorders. The
database includes educational resources and research tools for
an increasing number of automated language analyses.
|
Brian MacWhinney is Teresa Heinz Professor
of Psychology, Computational Linguistics,and Modern Languages
at Carnegie Mellon University. He received his Ph.D. in
psycholinguistics in 1974 from the University of California at
Berkeley. With Elizabeth Bates, he developed a model of first
and second language processing and acquisition based on
competition between item-based patterns. In 1984, he and
Catherine Snow co-founded the CHILDES (Child Language Data
Exchange System) Project for the computational study of child
language transcript data. This system has extended to 13
additional research areas such aphasiology, second language
learning, TBI, Conversation Analysis, developmental disfluency
and others in the shape of the TalkBank Project. MacWhinney's
recent work includes studies of online learning of second
language vocabulary and grammar, situationally embedded second
language learning, neural network modeling of lexical
development, fMRI studies of children with focal brain lesions,
and ERP studies of between-language competition. He is also
exploring the role of grammatical constructions in the marking
of perspective shifting, the determination of linguistic forms
across contrasting time frames, and the construction of mental
models in scientific reasoning. Recent edited books include The
Handbook of Language Emergence (Wiley) and Competing Motivations
in Grammar and Usage (Oxford).
|
References
- Becker J, Boller F, Lopez O, Saxton J, McGonigle K. The natural history of Alzheimer’s disease:
Description of study cohort and accuracy of diagnosis. Archives of
Neurology, 51(6):585–594, 1994. DOI:10.1001/archneur.1994.00540180063015
- de la Fuente Garcia S, Ritchie C, Luz S. Artificial
Intelligence, Speech, and Language Processing Approaches to
Monitoring Alzheimer’s Disease: A Systematic Review. Journal of
Alzheimer's Disease. 2020:1-27. DOI: 10.3233/JAD-200888
- Luz S, Haider F, de la Fuente S, Fromm D, MacWhinney
B. Alzheimer's Dementia Recognition through Spontaneous Speech:
The ADReSS Challenge. Proceedings of INTERSPEECH 2020. Also
available as arXiv preprint arXiv:2004.06833. 2020.
-
Luz S, Haider F, Fromm D, MacWhinney B, (eds.). Alzheimer’s Dementia
Recognition Through Spontaneous Speech. Lausanne, Switzerland:
Frontiers Media S.A., 2021. 258 p. DOI: 10.3389/978-2-88971-854-2
-
Rosenbaum PR, Rubin DB. 1983. The Central Role of
the Propensity Score in Observational Studies for Causal Effects.
Biometrika 70 (1): 41–55. DOI: 10.1093/biomet/70.1.41
- Rubin DB 1973. Matching to Remove Bias in Observational
Studies. Biometrics 29 (1): 159. DOI: 10.2307/2529684.
- Ho DE, Kosuke I, King G, Stuart EA. 2007. Matching as Nonparametric Preprocessing for Reducing
Model Dependence in Parametric Causal Inference. Political Analysis
15 (3): 199–236.
|
|