Skip to content

Datasheets for Datasets Schema

📚 View: Schema Documentation | D4D Examples | About

A LinkML schema for Datasheets for Datasets model as published in Datasheets for Datasets. Inspired by datasheets as used in the electronics and other industries, Gebru et al. proposed that every dataset "be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on".

Bridge2AI Generating Center Datasheets

Curated comprehensive datasheets for each Bridge2AI data generating project:

  • AI-READI - Retinal imaging and diabetes dataset
  • CM4AI - Cell maps for AI dataset
  • VOICE - Voice biomarker dataset
  • CHORUS - Health data for underrepresented populations

View all D4D examples →

Repository Structure

Browse the source code repository on GitHub:

About This Project

This repository stores a LinkML schema representation for the original Datasheets for Datasets model, representing the topics, sets of questions, and expected entities and fields in the answers. The schema includes 76 classes, 272 unique slots, and comprehensive coverage of:

  • Motivation - Why was the dataset created?
  • Composition - What's in the dataset?
  • Collection - How was data collected?
  • Preprocessing - What preprocessing was done?
  • Uses - What should the dataset be used for?
  • Distribution - How is the dataset distributed?
  • Maintenance - Who maintains the dataset?
  • Ethics - What ethical reviews were conducted?
  • Human Subjects - What protections for human subjects?
  • Data Governance - How is the data governed?

This project was made with linkml-project-cookiecutter.