Skip to content

D4D-Core Schema

The D4D-Core schema is the curated, interop-focused subset of D4D — the recommended starting point for new datasheets and the canonical surface for systems that exchange datasheets with RO-Crate, FAIRSCAPE, schema.org, DCAT, or Croissant RAI consumers. Every slot in d4d-core is paired with a SKOS-aligned external term in the Semantic Exchange Layer.

The full schema (data_sheets_schema.yaml, ~284 attributes) remains the extended reservoir; d4d-core (~95 fields) is the cross-system interop layer.

Schema artifacts

Artifact Path Description
Source schema data_sheets_schema_core.yaml Core schema entry point (imports D4D_Core.yaml)
Core module D4D_Core.yaml CoreDataset, CoreDatasetCollection, CoreDistribution and their slots
Merged form data_sheets_schema_core_all.yaml Single-file merged schema (auto-generated by make gen-core-schema)
Base import D4D_Base_import.yaml Shared base classes / slots / enums

Build & validate

make gen-core-schema     # produce merged data_sheets_schema_core_all.yaml
make validate-core       # linkml-validate on the core schema
make lint-core           # linkml-lint on the core module

Curated example datasheets

Each Bridge2AI generating center has a curated d4d-core-aligned datasheet:

  • AI-READI — Retinal imaging and diabetes
  • CHORUS — Health data for underrepresented populations
  • CM4AI — Cell maps for AI
  • VOICE — Voice biomarker

All D4D examples →

Core classes

Class Maps to Notes
CoreDataset schema:Dataset The primary dataset metadata record (~79 induced slots)
CoreDatasetCollection schema:Dataset (RO-Crate root) + dcat:Catalog tree_root: true; renders as @id: "./" with @type: ["Dataset", "https://w3id.org/EVI#ROCrate"]
CoreDistribution dcat:Distribution Concrete download/distribution surface
Person, Creator schema:Person People referenced in creator, author, contributor, maintainer
Organization schema:Organization Institutional affiliations and publishers
Grant, FundingMechanism schema:Grant Funding records linked via schema:funder

The full crosswalk lives in the Semantic Exchange Layer.

Why a "core" subset?

  • FAIR interop: every core slot has a documented SKOS mapping to one of schema.org / RO-Crate / FAIRSCAPE EVI / DCAT / Croissant RAI.
  • Smaller surface area: ~95 fields is tractable for hand-authoring and AI-assisted authoring; the full schema (~284 attributes) is for full-coverage research datasheets.
  • Validation-friendly: make validate-core runs in seconds against typical Bridge2AI inputs.
  • RO-Crate round-trip: core ↔ RO-Crate JSON-LD is the supported lossless conversion path; full-schema ↔ RO-Crate may require attribute drops or extension contexts.

See Semantic Exchange for the mapping artifacts and the /d4d-add-mapping workflow.