VOICE (Claude Code Synthesized)

ID *

Name

Bridge2AI-Voice

Title

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Description

The human voice contains complex acoustic markers which have been linked to important health conditions including dementia, mood disorders, and cancer. When viewed as a biomarker, voice is a promising characteristic to measure as it is simple to collect, cost-effective, and has broad clinical utility. Recent advances in artificial intelligence have provided techniques to extract previously unknown prognostically useful information from dense data elements such as images. The Bridge2AI-Voice project seeks to create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health. Here we present Bridge2AI-Voice, a comprehensive collection of data derived from voice recordings with corresponding clinical information. Bridge2AI-Voice v1.0, the initial release, provides 12,523 recordings for 306 participants collected across five sites in North America. Participants were selected based on known conditions which manifest within the voice waveform including voice disorders, neurological disorders, mood disorders, and respiratory disorders. The initial release contains data considered low risk, including derivations such as spectrograms but not the original voice recordings. Detailed demographic, clinical, and validated questionnaire data are also made available.

Language

en

Rrid

SCR_007345

Project Website

https://docs.b2ai-voice.org

Publisher

PhysioNet

Release Date

2025-01-17

Page

https://healthdatanexus.ai/content/b2ai-voice/1.0/

Keywords

VOICE
voice
bridge2ai
biomarker
dementia
mood disorders
cancer
voice disorders
neurological disorders
respiratory disorders
spectrograms
acoustic features
Health Data Nexus
PhysioNet
ethical data
AI
machine learning

Addressing Gaps

Response
Create an ethically sourced flagship dataset to enable AI research on voice as a biomarker, supporting critical insights into voice-health relationships not previously available in standardized datasets.

Composition

Overview

Derived audio representations and associated phenotype data from adult participants recruited at specialty clinics.

Population

Cohort Scope: Adult cohort only as of v1.1
Recruitment Region: Five sites in North America
Participants: 306
Recordings: 12,523

Condition Groups

Voice disorders
Neurological and neurodegenerative disorders
Mood and psychiatric disorders
Respiratory disorders
Pediatric voice and speech disorders (planned; not included in v1.1)

Data Characteristics

Modalities

Spectrograms derived from audio
Mel-frequency cepstral coefficients
Acoustic feature sets (openSMILE)
Phonetic and prosodic features (Parselmouth and Praat)
Transcriptions generated by OpenAI Whisper Large (free speech transcripts removed)
Phenotype and questionnaire data

Data Formats

Parquet
TSV
JSON

Identifiers In Files

participant_id
session_id
task_name

Sampling And Dimensions

Audio resampled to 16 kHz; spectrograms are 513 x N; MFCC arrays are 60 x N, where N is proportional to recording length.

Collection Process

Setting: Specialty clinics and institutions
Participant Selection: Screened for inclusion and exclusion criteria within five predetermined groups.
Consent: Participants provided consent for data collection and sharing of de-identified research data.
Procedure: Standardized protocol collecting demographics, health questionnaires, targeted confounders for voice, disease specific information, and voice tasks such as sustained vowel phonation.
Data Capture: Custom tablet application used for collection; headset used when possible.
Sessions: Most participants completed one session; a subset required multiple sessions.
Data Export And Merge: Exported from REDCap and converted using an open source library.

Acquisition Methods

Description
Standardized protocol collecting demographics, health questionnaires, targeted confounders for voice, disease specific information, and voice tasks such as sustained vowel phonation.
Was Directly Observed
Yes (voice recordings via tablet application)
Was Reported By Subjects
Yes (questionnaires)
Was Validated Verified
Standardized data collection protocol; validated questionnaires; REDCap data capture

Collection Mechanisms

Description
Custom tablet application used for collection; headset used when possible; REDCap for phenotype data

Data Collectors

Description
Five data collection sites in North America (specialty clinics)

Collection Timeframes

Description
Initial release (v1.0) in 2024; v1.1 released 2025-01-17; latest version 2.0.1 released 2025-08-18

Preprocessing Strategies

Description
Raw audio processing: Converted to mono and resampled to 16 kHz with a Butterworth anti-aliasing filter. Spectrograms: Short-time FFT with 25 ms window, 10 ms hop, 512-point FFT; stored in power representation. MFCC: 60 coefficients computed from spectrograms. Acoustic features: Extracted using openSMILE capturing temporal dynamics and acoustic characteristics. Phonetic/prosodic features: Computed using Parselmouth and Praat; includes measures of fundamental frequency, formants, and voice quality. Transcription: Generated using OpenAI Whisper Large; transcripts of free speech audio were removed prior to release. Open source code: b2aiprep library used to preprocess waveforms and merge phenotype data.

Cleaning Strategies

Description
De-identification using HIPAA Safe Harbor approach; removal of identifiers including names, geographic locators, dates at finer than year resolution, phone/fax numbers, email addresses, IP addresses, Social Security Numbers, medical record numbers, health plan beneficiary numbers, device identifiers, license numbers, account numbers, vehicle identifiers, website URLs, full face photos, biometric identifiers, and any unique identifiers. Removal of state and province; retention of country of data collection. Removal of transcripts of free speech audio. Omission of raw audio waveforms in v1.1; only spectrograms and other derived features are released.

Is Deidentified

Description

HIPAA Safe Harbor de-identification applied
No raw audio waveforms in v1.1; only derived representations released
Free speech transcripts removed to reduce re-identification risk

Sensitive Elements

Description
Health condition information (voice disorders, neurological disorders, mood disorders, respiratory disorders) under restricted access with data use agreement

Files

Version Notice

Files for version 1.1 are no longer available; the latest version of this project is 2.0.1.

Listing

Description	Path	Type
Dense time-frequency representations derived from voice waveforms; includes participant_id, session_...	spectrograms.parquet	Parquet
Mel-frequency cepstral coefficients derived from spectrograms; arrays of size 60 x N per recording.	mfcc.parquet	Parquet
One row per participant; demographics, acoustic confounders, and responses to validated questionnair...	phenotype.tsv	TSV
Data dictionary for phenotype.tsv with one sentence descriptions per column.	phenotype.json	JSON
One row per audio recording; features derived using openSMILE, Praat, parselmouth, and torchaudio.	static_features.tsv	TSV
Data dictionary for static_features.tsv with feature descriptions.	static_features.json	JSON

Limitations

Adult cohort only in v1.1; pediatric data not included.
No raw audio is released in v1.1; analyses are limited to derived representations.
Participants were selected based on conditions known to manifest in voice, which may affect generalizability.

Maintainers

Description
Bridge2AI-Voice project team; hosted on PhysioNet

Release Notes

Date	Notes	Version
2025-01-17	This release added Mel-frequency cepstral coefficients.	1.1
2024	Initial release of the dataset.	1.0
2025-04-16	Major update (details not provided in source)	2.0.0
2025-08-18	Latest version (details not provided in source)	2.0.1

Authors

Name
Alistair Johnson
Jean-Christophe Bélisle-Pipon
David Dorr
Satrajit Ghosh
Philip Payne
Maria Powell
Anais Rameau
Vardit Ravitsky
Alexandros Sigaras
Olivier Elemento
Yael Bensoussan

Corresponding Author

Not publicly listed; contact information requires login.

Software And Tools

Preprocessing Code

Name: b2aiprep
URL: https://github.com/sensein/b2aiprep
Description: Open source library used to preprocess raw audio and merge phenotype data.

Referenced Tools

openSMILE
Praat
Parselmouth
torchaudio
OpenAI Whisper Large
librosa (example usage for visualization)

Citations

Dataset Citation: Johnson, A., Bélisle-Pipon, J., Dorr, D., Ghosh, S., Payne, P., Powell, M., Rameau, A., Ravitsky, V., Sigaras, A., Elemento, O., & Bensoussan, Y. (2025). Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information (version 1.1). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/249v-w155
Platform Citation: Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

External Resources

External Resources
- PhysioNet platform (https://physionet.org/)
- Health Data Nexus (https://healthdatanexus.ai/content/b2ai-voice/1.0/)
- Project documentation (https://docs.b2ai-voice.org)
- b2aiprep GitHub repository (https://github.com/sensein/b2aiprep)
- Bridge2AI Voice REDCap on Zenodo (https://doi.org/10.5281/zenodo.14148755)

References

Rameau, A., Ghosh, S., Sigaras, A., Elemento, O., Belisle-Pipon, J.-C., Ravitsky, V., Powell, M., Johnson, A., Dorr, D., Payne, P., Boyer, M., Watts, S., Bahr, R., Rudzicz, F., Lerner-Ellis, J., Awan, S., Bolser, D., Bensoussan, Y. (2024) Developing Multi-Disorder Voice Protocols: A team science approach involving clinical expertise, bioethics, standards, and DEI.. Proc. Interspeech 2024, 1445-1449, doi: 10.21437/Interspeech.2024-1926
Bensoussan, Y., Ghosh, S. S., Rameau, A., Boyer, M., Bahr, R., Watts, S., Rudzicz, F., Bolser, D., Lerner-Ellis, J., Awan, S., Powell, M. E., Belisle-Pipon, J.-C., Ravitsky, V., Johnson, A., Zisimopoulos, P., Tang, J., Sigaras, A., Elemento, O., Dorr, D., … Bridge2AI-Voice. (2024). Bridge2AI Voice REDCap (v3.20.0). Zenodo. https://doi.org/10.5281/zenodo.14148755
Florian Eyben, Martin Wöllmer, Björn Schuller: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor, Proc. ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978-1-60558-933-6, pp. 1459-1462, 25.-29.10.2010.
Boersma P, Van Heuven V. Speak and unSpeak with PRAAT. Glot International. 2001 Nov;5(9/10):341-7.
Jadoul Y, Thompson B, De Boer B. Introducing parselmouth: A python interface to praat. Journal of Phonetics. 2018 Nov 1;71:1-5.
Hwang, J., Hira, M., Chen, C., Zhang, X., Ni, Z., Sun, G., Ma, P., Huang, R., Pratap, V., Zhang, Y., Kumar, A., Yu, C.-Y., Zhu, C., Liu, C., Kahn, J., Ravanelli, M., Sun, P., Watanabe, S., Shi, Y., Tao, T., Scheibler, R., Cornell, S., Kim, S., & Petridis, S. (2023). TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. arXiv preprint arXiv:2310.17864
Yang, Y.-Y., Hira, M., Ni, Z., Chourdia, A., Astafurov, A., Chen, C., Yeh, C.-F., Puhrsch, C., Pollack, D., Genzel, D., Greenberg, D., Yang, E. Z., Lian, J., Mahadeokar, J., Hwang, J., Chen, J., Goldsborough, P., Roy, P., Narenthiran, S., Watanabe, S., Chintala, S., Quenneville-Bélair, V, & Shi, Y. (2021). TorchAudio: Building Blocks for Audio and Speech Processing. arXiv preprint arXiv:2110.15018.
Bevers, I., Ghosh, S., Johnson, A., Brito, R., Bedrick, S., Catania, F., & Ng, E. (2017). My Research Software (Version 0.21.0) [Computer software]. https://github.com/sensein/b2aiprep
Johnson, A., Bélisle-Pipon, J., Dorr, D., Ghosh, S., Payne, P., Powell, M., Rameau, A., Ravitsky, V., Sigaras, A., Elemento, O., & Bensoussan, Y. (2024). Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information (version 1.0). Health Data Nexus. https://doi.org/10.57764/qb6h-em84

Motivation

Composition

Collection Process

Uses

Distribution

Maintenance

Human Subjects

Response
Development and benchmarking of models to associate voice-derived features with health conditions.
Exploration of acoustic, phonetic, and prosodic correlates of disease using de-identified derived da...

Description
PhysioNet restricted access repository
Health Data Nexus