Flagship Dataset of Type 2 Diabetes from the AI-READI Project
Datasheet for Dataset - Human Readable Format
🎯
Motivation
Why was the dataset created?
Dataset Resource
Grantor
Grant Name
Grant Number
National Institutes of Health
-
1OT2OD032644
Dataset Resource
Response
The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) project seeks to create a flagship ethically-sourced dataset to enable future generations of artificial intelligence/machine learning (AI/ML) research to provide critical insights into type 2 diabetes mellitus (T2DM), including salutogenic pathways to return to health.
📊
Composition
What do the instances represent?
Dataset Resource
Counts
1,067
Instance Type
participants
Data Topic
Data Substrate
Label
Label Description
Sampling Strategies
Missing Information
Dataset Resource
Subpopulation Elements Present
True
Identification
Race/ethnicity
Sex
Diabetes status
Distribution
Split Type
Race/Ethnicity
Sex
Diabetes Status
Train Split
Hispanic: 144, Asian: 167, Black: 211, White: 225
Male: 302, Female: 445
No DM: 292, Lifestyle: 162, Oral: 235, Insulin: 58
Val Split
Hispanic: 40, Asian: 40, Black: 40, White: 40
Male: 80, Female: 80
No DM: 47, Lifestyle: 33, Oral: 40, Insulin: 40
Test Split
Hispanic: 40, Asian: 40, Black: 40, White: 40
Male: 80, Female: 80
No DM: 41, Lifestyle: 39, Oral: 36, Insulin: 42
Total
Hispanic: 224, Asian: 247, Black: 291, White: 305
Male: 462, Female: 605
No DM: 380, Lifestyle: 234, Oral: 311, Insulin: 140
Dataset Resource
Description
The dataset is organized into multiple directories by datatype, with file formats including WaveForm DataBase (WFDB), CSV (conforming to OMOP CDM), Earth Science Data Systems (ESDS), Digital Imaging and Communications in Medicine (DICOM), and Open mHealth.
Flagship Dataset of Type 2 Diabetes from the AI-READI Project
Dataset Resource
This dataset contains data from 1067 participants that was collected between July 19, 2023 and July 31, 2024. Data from multiple modalities are included. The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed.
Dataset Resource
Role
Name
ORCID
Affiliation
Principal Investigator
-
AI-READI Consortium
Dataset Resource
2024-11-08
Dataset Resource
Diabetes mellitus
Machine Learning
Artificial Intelligence
Electrocardiography
Continuous Glucose Monitoring
Retinal imaging
Eye exam
Dataset Resource
Description
The data was collected between July 19, 2023 and July 31, 2024.
Dataset Resource
Clinical Dataset Structure (CDS) v0.1.1
WaveForm DataBase (WFDB)
Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)
Earth Science Data Systems (ESDS)
Digital Imaging and Communications in Medicine (DICOM)
Open mHealth
Dataset Resource
Description
Processing of the data was automated + custom.
Dataset Resource
Description
The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed.
Dataset Resource
Identifiable Elements Present
False
Description
The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed.
Dataset Resource
Sensitive Elements Present
True
Description
The dataset contains health data related to Type 2 Diabetes. It originally contained race/ethnicity and sex data which has been removed from the dataset but is available in aggregate form.
Dataset Resource
Description
Data was collected from multiple modalities, including 12-lead ECG, Holter monitor, smartwatch, REDCap for clinical data, a custom environmental sensor, fluorescence lifetime imaging ophthalmoscopy (FLIO), optical coherence tomography (OCT), optical coherence tomography angiography (OCTA), retinal photography, wearable fitness trackers, and continuous glucose monitoring (CGM) devices.
Was Directly Observed
Was Reported By Subjects
Was Inferred Derived
Was Validated Verified
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
🚀
Uses
What (other) tasks could the dataset be used for?
Dataset Resource
Description
As of the document date, the dataset has 12,603 views, has been cited by 3 resources, and has had 539 approved access requests.
Dataset Resource
Description
Users must agree to use the data only for type 2 diabetes related research. Other uses are implicitly discouraged.
Dataset Resource
Description
This work is licensed under a custom license. Accessing the dataset requires logging in through a verified ID system, agreeing to use the data only for type 2 diabetes related research, and agreeing to the license terms which set restrictions and obligations for data usage.
Older versions of the dataset are accessible and have their own DOIs. Version 1.0.0 is available at doi:10.60775/fairhub.1.
🔄
Maintenance
How will the dataset be maintained?
Dataset Resource
2.0.0
Dataset Resource
Description
The dataset is versioned. Changes between versions are provided in a CHANGELOG file. The current version is 2.0.0, and a previous version 1.0.0 exists.
Dataset Resource
Generated on 2025-10-30 10:38:41 using Bridge2AI Data Sheets Schema