Abstract: Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors. We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users' personal devices, and possibly from the overall style of speech in this domain.