From the book: Biomedical Sensing and Analysis published by Springer
Abstract: Depression is a costly and underdiagnosed global health concern, and there is a great need for improved patient screening. Speech technology offers promise for remote screening, but must perform robustly across patient and environmental variables. This chapter describes two deep learning models that achieve excellent performance in this regard. An acoustic model uses transfer learning from an automatic speech recognition (ASR) task. A natural language processing (NLP) model uses transfer learning from a language modeling task. Both models are studied using data from over 10,000 unique users who interacted with human-machine applications using conversational speech. Results for binary classification on a large test set show AUC performance of 0.79 and 0.83 for the acoustic and NLP models, respectively. RMSE for a regression task is 4.70 for the acoustic model and 4.27 for the NLP model. Further analysis of performance as a function of test subset characteristics indicates that the models are generally robust over speaker and session variables. It is concluded that both acoustic and NLP-based models have potential for use in generalized automated depression screening.