Flagship Dataset of Type 2 Diabetes from the AI-READI Project

Datasheet for Dataset - Human Readable Format

🎯

Motivation

Why was the dataset created?

Dataset Resource
GrantorGrant NameGrant Number
National Institutes of Health-1OT2OD032644
Dataset Resource
  • Response
    The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) project seeks to create a flagship ethically-sourced dataset to enable future generations of artificial intelligence/machine learning (AI/ML) research to provide critical insights into type 2 diabetes mellitus (T2DM), including salutogenic pathways to return to health.
📊

Composition

What do the instances represent?

Dataset Resource
  1. Counts
    1,067
    Instance Type
    participants
    Data Topic
    Data Substrate
    Label
    Label Description
    Sampling Strategies
    Missing Information
Dataset Resource
  • Subpopulation Elements Present
    True
    Identification
    • Race/ethnicity
    • Sex
    • Diabetes status
    Distribution
    Split TypeRace/EthnicitySexDiabetes Status
    Train SplitHispanic: 144, Asian: 167, Black: 211, White: 225Male: 302, Female: 445No DM: 292, Lifestyle: 162, Oral: 235, Insulin: 58
    Val SplitHispanic: 40, Asian: 40, Black: 40, White: 40Male: 80, Female: 80No DM: 47, Lifestyle: 33, Oral: 40, Insulin: 40
    Test SplitHispanic: 40, Asian: 40, Black: 40, White: 40Male: 80, Female: 80No DM: 41, Lifestyle: 39, Oral: 36, Insulin: 42
    TotalHispanic: 224, Asian: 247, Black: 291, White: 305Male: 462, Female: 605No DM: 380, Lifestyle: 234, Oral: 311, Insulin: 140
Dataset Resource
  • Description
    The dataset is organized into multiple directories by datatype, with file formats including WaveForm DataBase (WFDB), CSV (conforming to OMOP CDM), Earth Science Data Systems (ESDS), Digital Imaging and Communications in Medicine (DICOM), and Open mHealth.
Dataset Resource
🔍

Collection Process

How was the data acquired?

Dataset Resource
Dataset Resource
Flagship Dataset of Type 2 Diabetes from the AI-READI Project
Dataset Resource
This dataset contains data from 1067 participants that was collected between July 19, 2023 and July 31, 2024. Data from multiple modalities are included. The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed.
Dataset Resource
RoleNameORCIDAffiliation
Principal Investigator-AI-READI Consortium
Dataset Resource
2024-11-08
Dataset Resource
  • Diabetes mellitus
  • Machine Learning
  • Artificial Intelligence
  • Electrocardiography
  • Continuous Glucose Monitoring
  • Retinal imaging
  • Eye exam
Dataset Resource
  • Description
    The data was collected between July 19, 2023 and July 31, 2024.
Dataset Resource
  • Clinical Dataset Structure (CDS) v0.1.1
  • WaveForm DataBase (WFDB)
  • Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)
  • Earth Science Data Systems (ESDS)
  • Digital Imaging and Communications in Medicine (DICOM)
  • Open mHealth
Dataset Resource
  • Description
    Processing of the data was automated + custom.
Dataset Resource
  • Description
    The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed.
Dataset Resource
Identifiable Elements Present
False
Description
  • The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed.
Dataset Resource
  • Sensitive Elements Present
    True
    Description
    • The dataset contains health data related to Type 2 Diabetes. It originally contained race/ethnicity and sex data which has been removed from the dataset but is available in aggregate form.
Dataset Resource
  1. Description
    Data was collected from multiple modalities, including 12-lead ECG, Holter monitor, smartwatch, REDCap for clinical data, a custom environmental sensor, fluorescence lifetime imaging ophthalmoscopy (FLIO), optical coherence tomography (OCT), optical coherence tomography angiography (OCTA), retinal photography, wearable fitness trackers, and continuous glucose monitoring (CGM) devices.
    Was Directly Observed
    Was Reported By Subjects
    Was Inferred Derived
    Was Validated Verified
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
🚀

Uses

What (other) tasks could the dataset be used for?

Dataset Resource
  • Description
    As of the document date, the dataset has 12,603 views, has been cited by 3 resources, and has had 539 approved access requests.
Dataset Resource
  • Description
    Users must agree to use the data only for type 2 diabetes related research. Other uses are implicitly discouraged.
Dataset Resource
  • Description
    This work is licensed under a custom license. Accessing the dataset requires logging in through a verified ID system, agreeing to use the data only for type 2 diabetes related research, and agreeing to the license terms which set restrictions and obligations for data usage.
Dataset Resource
Dataset Resource
Dataset Resource
Dataset Resource
📤

Distribution

How will the dataset be distributed?

Dataset Resource
Dataset Resource
10.60775/fairhub.2
Dataset Resource
Dataset Resource
2,210,033,333,333
Dataset Resource
  • Description
    Older versions of the dataset are accessible and have their own DOIs. Version 1.0.0 is available at doi:10.60775/fairhub.1.
🔄

Maintenance

How will the dataset be maintained?

Dataset Resource
2.0.0
Dataset Resource
  • Description
    The dataset is versioned. Changes between versions are provided in a CHANGELOG file. The current version is 2.0.0, and a previous version 1.0.0 exists.
Dataset Resource
Generated on 2025-10-30 10:38:41 using Bridge2AI Data Sheets Schema