top of page
Shows people coming together to collaborate on research

Industry Leading Research

We are committed to leading the industry in clinical and speech technology research, establishing best practices, and contributing to the impact of voice as a biomarker for our mental health and wellbeing. 

Research to advance mental health care

Through our continued work and research, we believe the unique power of voice, machine learning, and AI will scale human capacity to advance quality mental health care - connecting the dots to a happier and healthier future. 

Clinical Validation

Clinical Validation

Published Papers and Independent Review Board (IRB) Studies

Published Papers

Published Papers

APRIL 8, 2022

Feasibility of a Machine-Learning Based Smartphone Application in Detecting Depression and Anxiety in a Generally Senior Population 

frontiers in psychology

Abstract: Depression and anxiety create a large health burden and increase the risk of premature mortality. Mental health screening is vital, but more sophisticated screening and monitoring methods are needed. The Ellipsis Health App addresses this need by using semantic information from recorded speech to screen for depression and anxiety.

Published Papers

FEBRUARY 12, 2022    Study Protocol

Evaluating the Feasibility and Acceptability of an Artificial-Intelligence-Enabled and Speech-Based Distress Screening Mobile App for Adolescents and Young Adults Diagnosed with Cancer

MDPI.png

Abstract: Adolescent and young adult (AYA) patients diagnosed with cancer are at a higher risk of psychological distress, which requires regular monitoring throughout their cancer journeys. Paper-and-pencil or digital surveys for psychological stress are often cumbersome to complete during a patient’s visit, and many patients find completing the same survey multiple times repetitive and boring. Recent advances in mobile technology and speech science have enabled flexible and engaging ways of monitoring psychological distress. This paper describes the scientific process we will use to evaluate an artificial intelligence (AI)-enabled mobile app to monitor depression and anxiety among AYAs diagnosed with cancer.

Independent Review Board Studies

desert oasis healthcare

Clinical Validation in Senior Population

Ellipsis Health conducted a study of a majority senior population at Desert Oasis Healthcare (DOHC) in Palm Springs, CA. Ellipsis recruited 250+ patients with a previous history of depression plus a control group without depression. Each subject was asked to  perform six voice recording sessions at least one week apart where each session consisted of three minutes of speech through answering open-ended questions that were designed to reveal their internal mental state. The Ellipsis Health App demonstrated feasibility in using voice recordings to screen for depression and anxiety among various age groups and almost 30% of participants spoke longer or did more sessions than the required amount. Findings have been published in Frontiers in Psychology.

vanderbilt university medical center

Monitoring Pre- and Post-Operative Patients

The Vanderbilt University Medical Center and Ellipsis Health study involves 250+ spine surgical patients who will be monitored for their severity of depression and anxiety throughout the surgical journey (pre-operatively, then weekly in the postoperative period). The study will also explore the relationship between depression/ anxiety and other pain-related measures.

mayo clinic

Supporting Employee & Caregiver Wellbeing

This Mayo Clinic and Ellipsis Health study involves 50 adult employees who are part of the Stress Management and Resilience Training (SMART) program at Mayo Clinic. Study participants will engage with the Ellipsis Health App weekly for 3 months to assess the employees' severity of depression and anxiety.

university of denver

Supporting & Evaluating the Mental Wellbeing of Adolescents

In partnership, University of Denver, Graduate School of Social Work, University of Michigan School of Social Work, and Ellipsis Health aim to validate Ellipsis Health’s screening tool for anxiety and depression in 700 adolescents aged 11-17.  Additionally, 60 adolescents enrolled in the validation study will use the Ellipsis Health App for evaluating if it is effective in improving a student’s mental wellbeing as well as improving the screening and monitoring of their depression and anxiety. Thirty students will also be recruited to provide focus group feedback on the acceptability of the Ellipsis Health App, as well as acceptability of a mental health resource page built into the Ellipsis Health App for school mental health clinicians.

university of michigan

Supporting the Mental Wellbeing of Adolescent & Young Adult Cancer Patients

This University of Michigan Medical Center and Ellipsis Health study will evaluate the feasibility and acceptability of the Ellipsis Health App to assess the psychological distress among adolescent and young adult (AYA) patients who have been diagnosed with cancer. In this study, 60 AYAs will be monitored using the Ellipsis Health tool once a month over a 6 month period. 

mind springs health

Averting Crisis Events

Ellipsis Health conducted a study with Mind Springs Health Depression Clinic. The study asked 100+ newly enrolled patients at the Depression Clinic program to perform weekly voice samples using Ellipsis Health’s App. During the study period, clinicians’ face to face assessments of depression and anxiety of the study participants were also collected. Eighty crisis events were averted through the use of Ellipsis Health. We will compare the gold standard clinicians’ assessments with the PHQ9/GAD7 scores and our technology outputs of the severity of depression and anxiety.

mayo clinic

Supporting Long-Haul Covid Patients

This Mayo Clinic and Ellipsis Health study involves 200+ adults with long term Covid symptoms (“long haulers”) and the influence of social isolation related to Covid-19. Participants will use Ellipsis Health’s App every other week for 24 weeks to assess their severity of depression and anxiety. 

hartford healthcare

Comparing the HAMD-6, PHQ-9 and Ellipsis Health in Inpatient and Outpatient Programs

The Hartford Health and Ellipsis Health study will recruit 300 adult patients in the Partial Hospitalization and Outpatient programs. Patients will complete a weekly voice journal with Ellipsis Health. The HamD-6 and PHQ-9 will also be collected weekly for up to 12 weeks.  The aim of the study is to compare Ellipsis Health scores for depression with HAMD-6 and PHQ-9 scores, and to assess the utility of the Ellipsis Health scores in assisting in the treatment of patients.

penn state

Evaluating Ellipsis Health + Comprehensive Set of Mental Health Assessments in College Students

The Penn State and Ellipsis Health study will compare the Ellipsis Health App scores to a comprehensive set of screening and diagnostic assessments for behavioral health conditions including anxiety and depression in 300+ mostly college students. The assessments include Beck Depression Inventory-II, Generalized Anxiety Disorder Questionnaire, Social Phobia Diagnostic Questionnaire, Marlow-Crowne Social Desirability Scale, Positive Impression Management Scale, Negative Impression Management Scale, Marlow-Crowne Social Desirability Scale, Positive Impression Management Scale, Negative Impression Management Scale, Patient Health Questionnaire-8, Generalized Anxiety Disorder-7, Panic Disorder Severity Rating, Dimensional Obsessive-Compulsive Scale, Posttraumatic Stress Disorder Checklist, Snaith-Hamilton Pleasure Scale and a MINI Version 7.0 structured interview.

university of houston and MD anderson cancer ceneter

Supporting the Mental Wellbeing of Caregivers for Adolescent & Young Adult Cancer Patients

This University of Texas, MD Anderson Cancer Center and Ellipsis Health study will evaluate the feasibility and acceptability of the Ellipsis Health App to assess the psychological distress among caregivers of adolescent and young adult (AYA) patients who have been diagnosed with cancer. In this study, 60 caregivers will be monitored using the Ellipsis Health tool once a month over a 6 month period. 

Peer-Reviewed Speech Technology Publications

December 5, 2023

Probabilistic Performance Bounds for Evaluating Depression Models Given Noisy Self-Report Labels
 

IEEE.png

Advances in AI for health applications rely on evaluating performance against labeled test data. In the area of mental health, self-report labels from surveys such as the Patient Health Questionnaire (PHQ) for depression, are useful but noisy. This "fuzzy label" problem is not currently reflected in reporting model performance, adding to the challenge of comparing results across diverse corpora, data sizes, metrics, and test label distributions. To address this issue, we develop an approach inspired by Bayes Error to estimate a model’s upper and lower performance bounds. Unlike past work, our approach can be used for both regression and classification. The method starts with a perfect match between target and prediction vectors, then applies label noise to degrade performance. To obtain confidence intervals, we use test-set bootstrapping to produce prediction and target vectors. We present results using voice-based deep learning models that predict depression risk from a conversational speech sample. Models capture both language and acoustic information. For label noise, we introduce results from a corpus in which 5625 unique subjects completed the PHQ-8 twice, separated by a short distraction task. Speech test data come from three real-world corpora encompassing over 3500 total datapoints. The test sets differ in speech elicitation, speech length, and speaker demographics among other factors. Results illustrate how probabilistic performance bounds based on PHQ-8 label noise affect the interpretation and comparison of models over corpora and metrics. Implications for science, technology, and future directions are discussed.

SEPTEMBER 19, 2022

Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language
 

isca.png

Abstract: Mental health risk prediction is a growing field in the speech community, but many studies are based on small corpora. This study illustrates how variations in test and train set sizes impact performance in a controlled study. Using a corpus of over 65K labeled data points, results from a fully crossed design of different train/test size combinations are provided. Two model types are included: one based on language and the other on speech acoustics. Both use methods current in this domain. An age-mismatched test set was also included. Results show that (1) test sizes below 1K samples gave noisy results, even for larger training set sizes; (2) training set sizes of at least 2K were needed for stable results; (3) NLP and acoustic models behaved similarly with train/test size variations, and (4) the mismatched test set showed the same patterns as the matched test set. Additional factors are discussed, including label priors, model strength and pre-training, unique speakers, and data lengths. While no single study can specify exact size requirements, results demonstrate the need for appropriately sized train and test sets for future studies of mental health risk prediction from speech and language.

JULY 20, 2022

Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening (Chapter 3 from the book Biomedical Sensing and Analysis)

springer.png

Abstract: Depression is a costly and underdiagnosed global health concern, and there is a great need for improved patient screening. Speech technology offers promise for remote screening, but must perform robustly across patient and environmental variables. This chapter describes two deep learning models that achieve excellent performance in this regard. An acoustic model uses transfer learning from an automatic speech recognition (ASR) task. A natural language processing (NLP) model uses transfer learning from a language modeling task. Both models are studied using data from over 10,000 unique users who interacted with human-machine applications using conversational speech. Results for binary classification on a large test set show AUC performance of 0.79 and 0.83 for the acoustic and NLP models, respectively. RMSE for a regression task is 4.70 for the acoustic model and 4.27 for the NLP model. Further analysis of performance as a function of test subset characteristics indicates that the models are generally robust over speaker and session variables. It is concluded that both acoustic and NLP-based models have potential for use in generalized automated depression screening.

JUNE 6, 2021

Speech-Based Depression Prediction using Encoder-Weight-Only Transfer Learning and a Large Corpus

is the world’s largest professional association dedicated to advancing technological innovation and excellence for the benefit of humanity

Abstract: Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance. Results suggest that this approach is flexible and offers promise for efficient implementation.

JANUARY 19, 2021

Cross-Demographic Portability of Deep NLP-Based Depression Models

is the world’s largest professional association dedicated to advancing technological innovation and excellence for the benefit of humanity

Abstract: Deep learning models are rapidly gaining interest for real-world applications in behavioral health. An important gap in current literature is how well such models generalize over different populations. We study Natural Language Processing (NLP) based models to explore portability over two different corpora highly mismatched in age. The first and larger corpus contains younger speakers. It is used to train an NLP model to predict depression. When testing on unseen speakers from the same age distribution, this model performs at AUC=0.82. We then test this model on the second corpus, which comprises seniors from a retirement community. Despite the large demographic differences in the two corpora, we saw only modest degradation in performance for the senior-corpus data, achieving AUC=0.76. Interestingly, in the senior population, we find AUC=0.81 for the subset of patients whose health state is consistent over time. Implications for demographic portability of speech-based applications are discussed.

DECEMBER 5, 2020

Robust Speech and Natural Language Processing Models for Depression Screening

is the world’s largest professional association dedicated to advancing technological innovation and excellence for the benefit of humanity

Abstract: Depression is a global health concern with a critical need for increased patient screening. Speech technology offers advantages for remote screening but must perform robustly across patients. We have described two deep learning models developed for this purpose. One model is based on acoustics; the other is based on natural language processing. Both models employ transfer learning. Data from a depression-labeled corpus in which 11,000 unique users interacted with a human-machine application using conversational speech is used. Results on binary depression classification have shown that both models perform at or above AUC=0.80 on unseen data with no speaker overlap. Performance is further analyzed as a function of test subset characteristics, finding that the models are generally robust over speaker and session variables. We conclude that models based on these approaches offer promise for generalized automated depression screening.

NOVEMBER 5, 2020

Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning

is the world’s largest professional association dedicated to advancing technological innovation and excellence for the benefit of humanity

Abstract: Digital screening and monitoring applications can aid providers in the management of behavioral health conditions. We explore deep language models for detecting depression, anxiety, and their comorbidity using input from conversational speech. Speech data comprise 16k spoken interactions labeled for both depression and anxiety. We find that results for binary classification range from 0.86 to 0.79 AUC, depending on condition and comorbidity. Best performance occurs for comorbid cases. We show that this result is not attributable to data skew. Finally, we find evidence suggesting that underlying word sequence cues may be more salient for depression than for anxiety.

SEPTEMBER 10, 2020

Comparing Speech Recognition Services for HCI Applications in Behavioral Health

ACM Digital Library

Abstract: Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors. We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users' personal devices, and possibly from the overall style of speech in this domain.

SEPTEMBER 15, 2019

Optimizing Speech-Input Length for Speaker-Independent Depression Classification

International Speech Communication Association

Abstract: Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. We analyze results for speakerindependent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. We examine performance as a function of response input length for two NLP systems that differ in overall performance. Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session. Systems share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it is better to pose a new question to the speaker, than to continue the current response. These and additional reported results suggest how applications can be better designed to both elicit and process optimal input lengths for depression classification.

Speech Technology
Speech-Based Depression IEEE
Cross Demographic Portability IEEE
Robust Speech NLP IEEE
Depression and Anxiety IEEE
Comparing Speech Recognition ACM
Optmizing Speech Input Miscam
Springer Chapter 3
Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

White Papers

Published Papers

Ellipsis Health + Ceras Healthalysis of Calls for Depression 1.jpg

Transforming Care Management Through AI-Driven Analysis of Calls for Depression

A shifting healthcare landscape is moving towards personalized and data-driven care management. Within this care transformation, Ceras Health and Ellipsis Health began a partnership to better understand and support the mental health of chronically ill patients by using voice and artificial intelligence (AI).

bottom of page