Developing a Vocal Biomarker


According to the Lancet Commission on Diagnostics, 32 – 65% of the population in developing countries don’t have access to the right diagnostics. Diagnostics is the single largest gap in the healthcare system. Vocal biomarkers are a very promising technology that can possibly revolutionize the diagnostics industry. Client wanted to create a vocal biomarker for identifying incidence of a condition affecting the vocal cords. 

Project brief

Develop an algorithm for screening patients for a voice affecting condition, based on a vocal biomarker. Multiple studies documenting the impact of the target condition on vocal cords established a causal relationship, we were to investigate the possibility of developing a vocal biomarker for a condition screening algorithm. 


  • More than 60% patients of the target condition go undetected for a prolonged period of time causing metabolic complications and triggering heart related conditions as well
  • Condition goes undetected because it requires a blood test and a lab report. A non-invasive, cheap, and remotely administered device will ensure higher screening numbers.
  • Timely detection and intervention would increase the quality of life of all affected patients

Our approach (healthark’s role)

Step 1: Defined Hypothesis

Explored the correlation between the condition and change in voice quality as documented by the medical literature. Examined and confirmed relevant first principles on voice characterization to enable algorithm design.

Step2: Validated Hypothesis 

Gathered a statistically viable dataset of voice samples and blood test results. Created an annotated dataset for training and validation of the algorithm.

Step 3: Conducted Real World Validation

Partnered with a large pharma company to conduct a clinical study to validate the algorithm on real world data and situations.

Project outcomes 

  • The algorithm demonstrated an initial sensitivity of 74% and specificity of 78%. The algorithms accuracy continues to grow with the sample size 
  • The voice-based screening enabled by the algorithm allowed us to identify more than 250 patients that would have gone undiagnosed otherwise. The algorithm will continue to screen for more patients as it is used on larger population sets.