data-sheets-schema
A LinkML schema for Datasheets for Datasets.
URI: https://w3id.org/bridge2ai/data-sheets-schema
Name: data-sheets-schema
Classes
Class | Description |
---|---|
FormatDialect | Additional format information for a file |
NamedThing | A generic grouping for any identifiable entity |
DatasetProperty | Represents a single property of a dataset, or a set of related properties |
AddressingGap | Was there a specific gap that needed to be filled by creation of the dataset? |
CleaningStrategy | Was any cleaning of the data done (e |
CollectionConsent | Did the individuals in question consent to the collection and use of their da... |
CollectionMechanism | What mechanisms or procedures were used to collect the data (e |
CollectionNotification | Were the individuals in question notified about the data collection? If so, p... |
CollectionTimeframe | Over what timeframe was the data collected, and does this timeframe match the... |
Confidentiality | Does the dataset contain data that might be confidential (e |
ConsentRevocation | If consent was obtained, were the consenting individuals provided with a mech... |
ContentWarning | Does the dataset contain any data that might be offensive, insulting, threate... |
Creator | Who created the dataset (e |
DataAnomaly | Are there any errors, sources of noise, or redundancies in the dataset? |
DataCollector | Who was involved in the data collection (e |
DataProtectionImpact | Has an analysis of the potential impact of the dataset and its use on data su... |
Deidentification | Is it possible to identify individuals in the dataset, either directly or ind... |
DirectCollection | Indicates whether the data was collected directly from the individuals in que... |
DiscouragedUse | Are there tasks for which the dataset should not be used? |
DistributionDate | When will the dataset be distributed? |
DistributionFormat | How will the dataset be distributed (e |
Erratum | Is there an erratum? If so, please provide a link or other access point |
EthicalReview | Were any ethical or compliance review processes conducted (e |
ExistingUse | Has the dataset been used for any tasks already? |
ExportControlRegulatoryRestrictions | Do any export controls or other regulatory restrictions apply to the dataset ... |
ExtensionMechanism | If others want to extend/augment/build on/contribute to the dataset, is there... |
ExternalResource | Is the dataset self-contained or does it rely on external resources (e |
FundingMechanism | Who funded the creation of the dataset? If there is an associated grant, plea... |
FutureUseImpact | Is there anything about the dataset's composition or collection that might im... |
HumanSubjectCompensation | Information about compensation or incentives provided to human research parti... |
HumanSubjectResearch | Information about whether the dataset involves human subjects research and wh... |
InformedConsent | Details about informed consent procedures used in human subjects research |
Instance | What do the instances that comprise the dataset represent (e |
InstanceAcquisition | Describes how data associated with each instance was acquired (e |
IPRestrictions | Have any third parties imposed IP-based or other restrictions on the data ass... |
LabelingStrategy | Was any labeling of the data done (e |
LicenseAndUseTerms | Will the dataset be distributed under a copyright or other IP license, and/or... |
Maintainer | Who will be supporting/hosting/maintaining the dataset? |
MissingInfo | Is any information missing from individual instances? (e |
OtherTask | What other tasks could the dataset be used for? |
ParticipantPrivacy | Information about privacy protections and anonymization procedures for human ... |
PreprocessingStrategy | Was any preprocessing of the data done (e |
Purpose | For what purpose was the dataset created? |
RawData | Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data... |
Relationships | Are relationships between individual instances made explicit (e |
RetentionLimits | If the dataset relates to people, are there applicable limits on the retentio... |
SamplingStrategy | Does the dataset contain all possible instances, or is it a sample (not neces... |
SensitiveElement | Does the dataset contain data that might be considered sensitive (e |
Splits | Are there recommended data splits (e |
Subpopulation | Does the dataset identify any subpopulations (e |
Task | Was there a specific task in mind for the dataset's application? |
ThirdPartySharing | Will the dataset be distributed to third parties outside of the entity (e |
UpdatePlan | Will the dataset be updated (e |
UseRepository | Is there a repository that links to any or all papers or systems that use the... |
VersionAccess | Will older versions of the dataset continue to be supported/hosted/maintained... |
VulnerablePopulations | Information about protections for vulnerable populations in human subjects re... |
Grant | The name and/or identifier of the specific mechanism providing monetary suppo... |
Information | Grouping for datasets and data files |
Dataset | A single component of related observations and/or information that can be rea... |
DataSubset | A subset of a dataset, likely containing multiple files of multiple potential... |
DatasetCollection | A collection of related datasets, likely containing multiple files of multipl... |
Organization | Represents a group or organization |
Grantor | The name and/or identifier of the organization providing monetary support or... |
Person | An individual human being |
Software | A software program or library |
Slots
Slot | Description |
---|---|
acquisition_methods | |
addressing_gaps | |
affiliation | The organization(s) to which the person belongs |
anomalies | |
anonymization_method | What methods were used to anonymize or de-identify participant data? Include ... |
archival | Indication whether official archival versions of external resources are inclu... |
assent_procedures | For research involving minors, what assent procedures were used? How was deve... |
bytes | Size of the data in bytes |
cleaning_strategies | |
collection_mechanisms | |
collection_timeframes | |
comment_prefix | |
compensation_amount | What was the amount or value of compensation provided? Include currency or eq... |
compensation_provided | Were participants compensated for their participation? |
compensation_rationale | What was the rationale for the compensation structure? How was the amount det... |
compensation_type | What type of compensation was provided (e |
compression | compression format used, if any |
confidential_elements | |
confidential_elements_present | Indicates whether any confidential data elements are present |
conforms_to | |
conforms_to_class | |
conforms_to_schema | |
consent_documentation | How is consent documented? Include references to consent forms or procedures ... |
consent_obtained | Was informed consent obtained from all participants? |
consent_scope | What specific uses did participants consent to? Are there limitations on data... |
consent_type | What type of consent was obtained (e |
content_warnings | |
content_warnings_present | Indicates whether any content warnings are needed |
counts | How many instances are there in total (of each type, if appropriate)? |
created_by | |
created_on | |
creators | |
data_collectors | |
data_linkage | Can this dataset be linked to other datasets in ways that might compromise pa... |
data_protection_impacts | |
data_substrate | Type of data (e |
data_topic | General topic of each instance (e |
delimiter | |
description | A human-readable description for a thing |
dialect | |
discouraged_uses | |
distribution | |
distribution_dates | |
distribution_formats | |
doi | digital object identifier |
double_quote | |
download_url | URL from which the data can be downloaded |
The email address of the person | |
encoding | the character encoding of the data |
errata | |
ethical_reviews | |
ethics_review_board | What ethics review board(s) reviewed this research? Include institution names... |
existing_uses | |
extension_mechanism | |
external_resources | |
format | The file format, physical medium, or dimensions of a resource |
funders | |
future_guarantees | Explanation of any commitments that external resources will remain available ... |
future_use_impacts | |
grant | Name/identifier of the specific grant mechanism supporting dataset creation |
grant_number | The alphanumeric identifier for the grant |
grantor | Name/identifier of the organization providing monetary or resource support |
guardian_consent | For participants unable to provide their own consent, how was guardian or sur... |
hash | hash of the data |
header | |
id | A unique identifier for a thing |
identifiable_elements_present | Indicates whether data subjects can be identified |
identification | |
instance_type | Multiple types of instances? (e |
instances | |
involves_human_subjects | Does this dataset involve human subjects research? |
ip_restrictions | |
irb_approval | Was Institutional Review Board (IRB) approval obtained? Include approval numb... |
is_data_split | Is this subset a split of the larger dataset, e |
is_deidentified | |
is_random | Indicates whether the sample is random |
is_representative | Indicates whether the sample is representative of the larger set |
is_sample | Indicates whether it is a sample of a larger set |
is_subpopulation | Is this subset a subpopulation of the larger dataset, e |
is_tabular | |
issued | |
keywords | |
label | Is there a label or target associated with each instance? |
label_description | If labeled, what pattern or format do labels follow? |
labeling_strategies | |
language | language in which the information is expressed |
last_updated_on | |
license | |
license_and_use_terms | |
maintainers | |
md5 | md5 hash of the data |
media_type | The media type of the data |
missing | Description of the missing data fields or elements |
missing_information | References to one or more MissingInfo objects describing missing data |
modified_by | |
name | A human-readable name for a thing |
other_tasks | |
page | |
path | |
preprocessing_strategies | |
principal_investigator | A key individual (Principal Investigator) responsible for or overseeing datas... |
privacy_techniques | What privacy-preserving techniques were applied (e |
profile | The frictionless data profile to which the data conforms |
publisher | |
purposes | |
quote_char | |
raw_sources | |
regulatory_compliance | What regulatory frameworks govern this human subjects research (e |
regulatory_restrictions | |
reidentification_risk | What is the assessed risk of re-identification? What measures were taken to m... |
representative_verification | Explanation of how representativeness was validated or verified |
resources | |
response | Short explanation describing the primary purpose of creating the dataset |
restrictions | Description of any restrictions or fees associated with external resources |
retention_limit | |
sampling_strategies | |
sensitive_elements | |
sensitive_elements_present | Indicates whether sensitive data elements are present |
sha256 | sha256 hash of the data |
source_data | Description of the larger set from which the sample was drawn, if any |
special_populations | Does the research involve any special populations that require additional pro... |
special_protections | What additional protections were implemented for vulnerable populations? Incl... |
status | |
strategies | Description of the sampling strategy (deterministic, probabilistic, etc |
subpopulation_elements_present | Indicates whether any subpopulations are explicitly identified |
subpopulations | |
subsets | |
tasks | |
themes | Themes associated with the data |
title | the official title of the element |
updates | |
url | |
use_repository | |
used_software | What software was used as part of this dataset property? |
version | |
version_access | |
vulnerable_groups_included | Are any vulnerable populations included (e |
warnings | |
was_derived_from | |
was_directly_observed | Whether the data was directly observed |
was_inferred_derived | Whether the data was inferred or derived from other data |
was_reported_by_subjects | Whether the data was reported directly by the subjects themselves |
was_validated_verified | Whether the data was validated or verified in any way |
why_missing | Explanation of why each piece of data is missing |
why_not_representative | Explanation of why the sample is not representative, if applicable |
withdrawal_mechanism | How can participants withdraw their consent? What procedures are in place for... |
Enumerations
Enumeration | Description |
---|---|
Boolean | |
CompressionEnum | |
CreatorOrMaintainerEnum | |
EncodingEnum | |
FormatEnum | |
MediaTypeEnum |
Types
Type | Description |
---|---|
Boolean | A binary (true or false) value |
Curie | a compact URI |
Date | a date (year, month and day) in an idealized calendar |
DateOrDatetime | Either a date or a datetime |
Datetime | The combination of a date and time |
Decimal | A real number with arbitrary precision that conforms to the xsd:decimal speci... |
Double | A real number that conforms to the xsd:double specification |
Float | A real number that conforms to the xsd:float specification |
Integer | An integer |
Jsonpath | A string encoding a JSON Path |
Jsonpointer | A string encoding a JSON Pointer |
Ncname | Prefix part of CURIE |
Nodeidentifier | A URI, CURIE or BNODE that represents a node in a model |
Objectidentifier | A URI or CURIE that represents an object in the model |
Sparqlpath | A string encoding a SPARQL Property Path |
String | A character string |
Time | A time object represents a (local) time of day, independent of any particular... |
Uri | a complete URI |
Uriorcurie | a URI or a CURIE |
Subsets
Subset | Description |
---|---|
Collection | The questions in this section are designed to elicit information that may hel... |
Composition | The questions in this section are intended to provide dataset consumers with ... |
DataGovernance | The questions in this section relate to how the dataset is governed: how it i... |
Distribution | The questions in this section pertain to dataset distribution |
Ethics | The questions in this section address ethical and data-protection concerns, i... |
Maintenance | The questions in this section are intended to encourage dataset creators to p... |
Motivation | The questions in this section are primarily intended to encourage dataset cre... |
Preprocessing-Cleaning-Labeling | The questions in this section are intended to provide dataset consumers with ... |
Uses | The questions in this section are intended to encourage dataset creators to r... |