## ICASSP 2023 SPGC Challenge: Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech

News:

• NEW! Challenge description and Baseline paper now available! (16-1-23)
• Registration deadline extended to 13 January (6-1-23).
• Important!: Test set Change; we will now use a test set of Greek spontaneous speech picture descriptions. A sample set will be provided. If you have already received the Spanish sample set, please ignore it and request the replacement Greek sample (6-1-23)
• Name Change: this SPGC is now named ADReSS-M (6-1-23)
• The Multilingual Alzheimer's Dementia Recognition on Spontaneous Speech, an ICASSP'23 Signal Processing Grand Challenge (SPGC) is announced! (27-11-22)

## The SPGC on Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech, at ICASSP 2023

The ADReSS-M Signal Processing Grand Challenge targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). Dementia is a category of neurodegenerative diseases that entails a long-term and usually gradual decrease of cognitive functioning. While there has been much interest in automated methods for cognitive impairment detection by the signal processing and machine learning communities (de La Fuente, Ritchie and Luz, 2020), most of the proposed approaches have not investigated which speech features can be generalised and transferred across languages for AD prediction, and to the best of our knowledge no work has investigated acoustic features of the speech signal in multilingual AD detection. The ADReSS-M Challenge targets this issue by defining a prediction task whereby participants train their models based on English speech data and assess their models' performance on spoken Greek data. It is expected that the models submitted to the challenge will focus on acoustic features of the speech signal and discover features whose predictive power is preserved across languages, but other approaches can be considered.

In keeping with the objectives of AD prediction evaluation, the ADReSS-M challenge's dataset is statistically matched so as to mitigate common biases often overlooked in evaluations of AD detection methods, including repeated occurrences of speech from the same participant (common in longitudinal datasets), variations in audio quality, and imbalances of gender, age and educational level. By focusing on AD recognition using spontaneous speech, we depart from neuropsychological and clinical evaluation approaches, as spontaneous speech analysis has the potential to enable novel applications for speech technology in longitudinal, unobtrusive monitoring of cognitive health.

This challenge aims to provide a platform for contributions and discussions on applying signal processing and machine learning methods for Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech. We invite the submission of papers describing system ideas (machine learning architectures), novel signal processing features, feature selection, and feature extraction methods in the context of ADReSS-M Challenge.

The organizers have recently conducted an extensive systematic review of the scientific literature on speech and language processing AI methods for detection of cognitive decline in AD (de La Fuente, Ritchie and Luz, 2020) which might offer participants a handy overview of approaches and results in this field.

## How to participate

1. a classification task, where the model will aim to distinguish healthy control speech from AD/MCI speech, and
2. an MMSE score prediction (regression) task, where you create a model to infer the subject's Mini Mental Status Examination (MMSE) score based on speech data;

You may choose to do one or both of these tasks. You will be provided with access to a training set (see relevant section below), and two weeks prior to the paper submission deadline you will be provided with test sets on which you can test your models.

You may send up to five sets of results to us for scoring for each task. You are required to submit all your attempts together, in separate files named: madress_results_task1.txt, madress_results_task2.txt (or one of these, should you choose not to enter both tasks). These must contains the IDs of the test files and your model's predictions. You will be provided with README files in the test sets archives with further details. The test sets will contain README.md files with further details.

As the broader scientific goal of ADReSS-M is to gain insight into the nature of the relationship between speech and cognitive function across different languages, we encourage you to upload a paper describing your approaches and results to a pre-print repository such as arXiv or medRxiv regardless of your ranking in the Challenge, and to share your code through a publicly accessible repository, if possible using a literate programming "notebook" environment such as R Markdown or Jupyter Notebook.

The top 5 ranked teams will be invited to submit a 2-page paper describing their approach and present it at ICASSP-2023 . Accepted papers will be in the ICASSP proceedings. The teams that present their work at ICASSP are also invited to submit a full paper about their work to to the IEEE Open Journal of Signal Processing (OJ-SP).

Note: Please do not use any other data from DementiaBank to train your models (e.g. for augmentation), as the task dataset may contain files from that repository.

## The data set

The training dataset consists of spontaneous speech samples corresponding to audio recordings of picture descriptions produced by cognitively normal subjects and patients with an AD diagnosis, who were asked to describe the Cookie Theft picture from the Boston Diagnostic Aphasia Examination test (Becker et al., 1994). The participants were speakers of English. The test set consists of spontaneous speech descriptions of a different picture, in Greek. The recordings were made in one of these languages. You will initially be allowed access only to the training data (English) and some sample Greek data for this SPGC. The data are organised in the following directory hierarchy:
MADReSS
├── sample-gr
├── sample-gr-groundtruth.csv
├── train
└── training-groundtruth.csv

Important: note that the sample and test data are now (as of
6-1-23) in Greek,
data, please replace it by the sample-gr data provided. The test data
will be similar to these samples.


The groundtruth.csv files contain the participant's cognitive status (Control or ProbableAD, for cognitively normal and patients diagnosed with probable Alzheimer's dementia), and the results of their mini-mental state examination (MMSE) test, which is widely used for cognitive accessment. The train directory contains a training set of 237 picture description recordings in English. The sample contains a small sample of 8 picture descriptions in Greek. The test set will consist of 48 recordings, all in Greek.

The ADReSS-M training dataset has been balanced with respect to age and gender in order to eliminate potential confunding and bias. We employed a propensity score approach to matching (Rosenbaum & Rubin, 1983; Rubin 1973; Ho et al. 2007). The dataset was checked for matching according to scores defined in terms of the probability of an instance being treated as AD given covariates age and gender estimated through logistic regression, and matching instances were selected. All standardized mean differences for the covariates were well below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were well below 0.15, indicating adequate balance for the covariates. The propensity score was estimated using a probit regression of the treatment on the covariates age and gender (probit generated a better balanced than logistic regression).

## Evaluation and ranking

Task 1: AD classification will be evaluated through the accuracy metric: $\displaystyle \operatorname {A} = {\frac { TN + TP }{N} }$ Specificity, sensitivity ($$\rho$$) and $$F_1$$ scores for the AD class will also be reported on the ranked list to be published on this web site. These metrics will be computed as follows: $\displaystyle \operatorname{Sp} = { \frac { TN }{TN + FP} },$ and $\displaystyle \operatorname {F_1} = { 2 \frac { \pi \times \rho }{\pi + \rho} }$ where $\displaystyle \operatorname {\pi} = { \frac { TP }{TP + FP} },$ $\displaystyle \operatorname {\rho} = { \frac { TP }{TP + FN} },$ N is the number of patients, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN the number of false negatives. You will also be asked to submit prediction probabilities, so that area under the ROC curve scores can also be published.

Task 2 (MMSE prediction) will be evaluated using the coefficient of determination: $\displaystyle \operatorname {R^2} =1 - \frac {\sum_{i=1}^N(\hat{y}_{i} - y_{i})^2} {\sum_{i=1}^N(\hat{y}_{i} - \bar{y})^2}$ and the root mean squared error: $\displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i})^{2}}{N}}}$ where $$\hat{y}$$ is the predicted MMSE score, $$y$$ is the patient's actual MMSE score, and $$\bar{y}$$ is the mean score.

When more than one attempt is submitted for scoring against the test set, all results should be considered (not only the best result overall) and reported in the paper.

The ranking of submissions will be done based on accuracy scores for the classification task (task 1), and on RMSE scores for the MMSE score regression task (task 2). The top 5 models will consist of:
• The two top performing (most accurate) teams for the classification task
• The two top performing (least RMSE) teams for the MMSE regression task
• The team that performed best on average for the two tasks, chosen according to the following formula $$T_i = \frac{A_i}{\sum_j^T {A_j}} + 1 - \frac{RMSE_i}{\sum_j^T{\operatorname{RMSE_j}}}$$, where $$T_i$$ is the total score of team $$i$$ and $$T$$ is the total number of teams in the challenge. If a team chooses not to submit results for a task, its score for that task will be set to 0.
Ties will be broken by averaging performance over all attempts. The criteria above will be applied so that the rank results in 5 different teams. So, if one team is selected as a top team under one of the criteria, it will not be selected as a top team in another. Should such a case occur, the next top-performing team will be selected.

## ADReSS-M SPGC Description Paper and Baseline Results

A paper describing this Signal Processing Grand Challenge and its dataset more fully, along with a basic set of baseline results will be posted to https://arxiv.org/ shortly, and linked to this web page. Papers submitted to this Challenge using the MADReSS dataset should cite this paper as follows

1. Luz S, Haider F, Fromm D, Lazarou I, Kompatsiaris I, MacWhinney B. Multilingual Alzheimer’s Dementia Recognition through Spontaneous Speech: a Signal Processing Grand Challenge. arXiv; 2023. DOI: 10.48550/arXiv.2301.05562, available from: https://arxiv.org/abs/2301.05562.

We encourage you to submit papers describing your approaches to the tasks set here to https://arxiv.org/, regardless of your ranking in the Challenge, and to share your code through open-source repositories. Please note that the intellectual property (IP) related to your submission is not transferred to the challenge organizers, i.e., if code is shared/submitted, the participants remain the owners of their code. When the code is made publicly available, an appropriate license should be added.

## Important Dates

• 27th November: ADReSS-M Challenge announced and Call for Participation Published
• 6th February: deadline for submission of results
• 7th February: top-scoring models invited to submit 2-page paper
• 20th February: Grand Challenge 2-page Papers due.
• 4th-9th June: ICASSP 2023.
See other important dates on the ICASSP 2023 website.

## Paper Submission

Please format your paper following the ICASSP 2023 guidelines, except for the page limit, which for the SPGC is 2 pages. Further instructions will be given here in due time.

Papers submitted to this Challenge should refer to the ADReSS-M Challenge descrition paper (see reference above for citation).

## Organizers

 Saturnino Luz is a Reader at the Usher Institute, at the University of Edinburgh's Medical School. He works on digital biomarkers and precision medicine, devising and applying machine learning, signal processing and natural language processing methods to the study of behaviour and communication in healthcare contexts. His main research interest is the computational modelling of behavioural and biological changes caused by neurodegenerative diseases, with focus on the analysis of vocal and linguistic signals in Alzheimers's disease. Fasih Haider is a Research Fellow at the Centre for Medical Informatics, Usher Institute, University of Edinburgh, UK. His areas of interest are Social Signal Processing and Artificial Intelligence. Before joining the Usher Institute, he was a Research Engineer at the ADAPT Centre where he worked on methods of Social Signal Processing for video intelligence. He holds a PhD in Computer Science from Trinity College Dublin, Ireland. Currently, he is investigating the use of social signal processing and machine learning for monitoring cognitive health. Davida Fromm is a Special Faculty member in the Psychology Department at Carnegie Mellon University. Her research interests have focused on aphasia, dementia, and apraxia of speech in adults. For the past 12 years, she has helped to develop a large shared database of multi-media discourse samples for a variety of neurogenic communication disorders. The database includes educational resources and research tools for an increasing number of automated language analyses. Brian MacWhinney is Teresa Heinz Professor of Psychology, Computational Linguistics,and Modern Languages at Carnegie Mellon University. He received his Ph.D. in psycholinguistics in 1974 from the University of California at Berkeley. With Elizabeth Bates, he developed a model of first and second language processing and acquisition based on competition between item-based patterns. In 1984, he and Catherine Snow co-founded the CHILDES (Child Language Data Exchange System) Project for the computational study of child language transcript data. This system has extended to 13 additional research areas such aphasiology, second language learning, TBI, Conversation Analysis, developmental disfluency and others in the shape of the TalkBank Project. MacWhinney's recent work includes studies of online learning of second language vocabulary and grammar, situationally embedded second language learning, neural network modeling of lexical development, fMRI studies of children with focal brain lesions, and ERP studies of between-language competition. He is also exploring the role of grammatical constructions in the marking of perspective shifting, the determination of linguistic forms across contrasting time frames, and the construction of mental models in scientific reasoning. Recent edited books include The Handbook of Language Emergence (Wiley) and Competing Motivations in Grammar and Usage (Oxford).

## References

1. Becker J, Boller F, Lopez O, Saxton J, McGonigle K. The natural history of Alzheimer’s disease: Description of study cohort and accuracy of diagnosis. Archives of Neurology, 51(6):585–594, 1994. DOI:10.1001/archneur.1994.00540180063015
2. de la Fuente Garcia S, Ritchie C, Luz S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer’s Disease: A Systematic Review. Journal of Alzheimer's Disease. 2020:1-27. DOI: 10.3233/JAD-200888
3. Luz S, Haider F, de la Fuente S, Fromm D, MacWhinney B. Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge. Proceedings of INTERSPEECH 2020. Also available as arXiv preprint arXiv:2004.06833. 2020.
4. Luz S, Haider F, Fromm D, MacWhinney B, (eds.). Alzheimer’s Dementia Recognition Through Spontaneous Speech. Lausanne, Switzerland: Frontiers Media S.A., 2021. 258 p. DOI: 10.3389/978-2-88971-854-2
5. Rosenbaum PR, Rubin DB. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70 (1): 41–55. DOI: 10.1093/biomet/70.1.41
6. Rubin DB 1973. Matching to Remove Bias in Observational Studies. Biometrics 29 (1): 159. DOI: 10.2307/2529684.
7. Ho DE, Kosuke I, King G, Stuart EA. 2007. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15 (3): 199–236.