| access_details |
Information on how to access or retrieve the raw source data |
| access_url |
URL or access point for the raw data |
| access_urls |
One or more URLs providing access to the distribution channel(s) or format(s) |
| acquisition_details |
Free-text description of how data was acquired for each instance, including i... |
| acquisition_methods |
Methods used to acquire or obtain dataset instances |
| addressing_gaps |
Research or practical gaps this dataset addresses |
| affected_subsets |
One or more specific subsets or features of the dataset affected by this bias... |
| affiliation |
The organization(s) to which the person belongs in the context of this datase... |
| affiliations |
Organizations with which the creator or team is affiliated |
| agreement_metric |
Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's a... |
| analysis_method |
Methodology used to assess annotation quality and resolve disagreements |
| annotation_analyses |
Analysis of annotation quality and inter-annotator agreement |
| annotation_quality_details |
Additional details on annotation quality assessment and findings |
| annotations_per_item |
Number of annotations collected per data item |
| annotator_demographics |
One or more demographic characteristics of the annotators, if available and r... |
| anomalies |
Known data quality issues, errors, or irregularities in the dataset |
| anomaly_details |
Free-text description of errors, noise sources, or redundancies in the datase... |
| anonymization_method |
What methods were used to anonymize or de-identify participant data? Include ... |
| archival |
Indicates whether official archival versions of external resources are includ... |
| assent_procedures |
For research involving minors, what assent procedures were used? How was deve... |
| at_risk_groups_included |
Are any at-risk populations included (e |
| at_risk_populations |
Information about protections for at-risk populations (e |
| bias_description |
Detailed description of how this bias manifests in the dataset, including aff... |
| bias_type |
The type of bias identified, using standardized categories from the Artificia... |
| bytes |
Size of the data in bytes |
| categories |
One or more permitted categories or values for a categorical variable |
| citation |
Recommended citation for this dataset in DataCite or BibTeX format |
| cleaning_details |
Free-text description of data cleaning procedures applied, including criteria... |
| cleaning_strategies |
Data cleaning and quality control procedures applied to the dataset |
| collection_consents |
Consent obtained from individuals for data collection and use |
| collection_details |
Free-text description of whether data was collected directly from individuals... |
| collection_mechanisms |
Mechanisms, instruments, or tools used for data collection |
| collection_notifications |
Notifications provided to individuals about data collection |
| collection_timeframes |
Time periods during which data was collected |
| collection_type |
Type(s) of content in this file collection |
| collector_details |
Free-text description of who was involved in data collection (e |
| comment_prefix |
Character(s) used to indicate comment lines (e |
| compensation_amount |
What was the amount or value of compensation provided? Include currency or eq... |
| compensation_provided |
Were participants compensated for their participation? |
| compensation_rationale |
What was the rationale for the compensation structure? How was the amount det... |
| compensation_type |
What type of compensation was provided (e |
| compression |
Compression format used, if any (e |
| confidential_elements |
Confidential or restricted information within the dataset that requires acces... |
| confidential_elements_present |
Indicates whether any confidential data elements are present |
| confidentiality_details |
Free-text description of which data elements are confidential, the basis for ... |
| confidentiality_level |
Confidentiality classification of the dataset indicating level of access rest... |
| conforms_to |
An established standard, specification, or schema to which the resource confo... |
| conforms_to_class |
The specific class or type within a schema to which the resource conforms |
| conforms_to_schema |
The schema or data model to which the resource conforms |
| consent_details |
Free-text description of how consent was requested (e |
| consent_documentation |
How is consent documented? Include references to consent forms or procedures ... |
| consent_obtained |
Was informed consent obtained from all participants? |
| consent_revocations |
Mechanisms for individuals to revoke previously given consent |
| consent_scope |
What specific uses did participants consent to? Are there limitations on data... |
| consent_type |
What type of consent was obtained (e |
| contact_person |
Contact person for questions about ethical review |
| content_warnings |
Content warnings for potentially harmful, offensive, or disturbing material i... |
| content_warnings_present |
Indicates whether any content warnings are needed |
| contribution_url |
URL for contribution guidelines or process |
| counts |
How many instances are there in total (of each type, if appropriate)? |
| created_by |
The person or organization primarily responsible for creating the resource |
| created_on |
The date and time when the resource was created |
| creators |
Individuals or organizations who created the dataset |
| credit_roles |
One or more contributor roles using the CRediT (Contributor Roles Taxonomy) f... |
| data_annotation_platform |
One or more platforms or tools used for annotation (e |
| data_annotation_protocol |
Annotation methodology, tasks, and protocols followed during labeling |
| data_collectors |
Individuals or organizations responsible for collecting the data |
| data_linkage |
Can this dataset be linked to other datasets in ways that might compromise pa... |
| data_protection_impacts |
Data protection impact assessments (DPIAs) conducted for the dataset |
| data_substrate |
Type of data (e |
| data_topic |
General topic of each instance (e |
| data_type |
The data type of the variable (e |
| data_use_permission |
Structured data use permissions using the Data Use Ontology (DUO) |
| deidentification_details |
Details on de-identification procedures and residual risks |
| delimiter |
Field delimiter character (e |
| derivation |
Description of how this variable was derived or calculated from other variabl... |
| description |
A human-readable description for a thing |
| dialect |
Specific format dialect or variation (e |
| direct_collection |
Whether data was collected directly from individuals or via third parties |
| disagreement_patterns |
Systematic patterns in annotator disagreements (e |
| discouraged_uses |
Uses that are not recommended for this dataset due to limitations, risks, or ... |
| discouragement_details |
Free-text description of tasks or applications for which the dataset is not r... |
| distribution |
The distribution of instances across identified subpopulations, including cou... |
| distribution_dates |
Dates when the dataset was or will be distributed or released |
| distribution_formats |
Formats in which the dataset is distributed or made available |
| doi |
Digital Object Identifier (DOI) in format 10 |
| double_quote |
Whether quotes within quoted fields are escaped by doubling them |
| download_url |
URL from which the data can be downloaded |
| email |
The email address of the person |
| encoding |
The character encoding of the data |
| end_date |
End date of data collection |
| errata |
Known errors or corrections to the dataset since publication |
| erratum_details |
Free-text description of the error, its scope, the affected data or records, ... |
| erratum_url |
URL or access point for the erratum |
| ethical_reviews |
Ethical reviews and institutional oversight for the dataset |
| ethics_review_board |
What ethics review board(s) reviewed this research? Include institution names... |
| examples |
List of examples of known/previous uses of the dataset |
| existing_uses |
Known existing uses of the dataset at the time of publication |
| extension_details |
Free-text description of how third parties can contribute to the dataset, how... |
| extension_mechanism |
Mechanisms for extending or contributing to the dataset |
| external_resources |
Links or identifiers for external resources |
| file_collections |
Collections of files within this dataset |
| file_count |
Number of files in this collection |
| file_type |
Semantic type or purpose of this file (e |
| format |
The file format, physical medium, or dimensions of a resource |
| frequency |
How often updates are planned (e |
| funders |
Funding mechanisms that supported dataset creation |
| future_guarantees |
Explanation of any commitments that external resources will remain available ... |
| future_use_impacts |
Anticipated impacts of future uses, including risks and benefits |
| governance_committee_contact |
Contact person for data governance committee |
| grant_number |
The alphanumeric identifier for the grant |
| grantor |
Name/identifier of the organization providing monetary or resource support |
| grants |
Grant mechanisms supporting dataset creation |
| guardian_consent |
For participants unable to provide their own consent, how was guardian or sur... |
| handling_strategy |
The primary strategy used to handle missing data (e |
| hash |
Cryptographic hash value of the data for integrity verification (e |
| header |
Whether the first row of the file contains column headers |
| hipaa_compliant |
Indicates compliance with the Health Insurance Portability and Accountability... |
| human_subject_research |
Information about whether dataset involves human subjects research, including... |
| id |
A unique identifier for a thing |
| identifiable_elements_present |
Indicates whether data subjects can be identified |
| identification |
How subpopulations are identified and defined (e |
| identifiers_removed |
List of identifier types removed during de-identification (e |
| impact_details |
Free-text description of potential future impacts or risks arising from the d... |
| imputation_method |
Specific imputation technique used (mean, median, mode, forward fill, backwar... |
| imputation_protocols |
Data imputation protocols applied to handle missing values |
| imputation_rationale |
Justification for the imputation approach chosen, including assumptions made ... |
| imputation_validation |
Methods used to validate imputation quality (if any) |
| imputed_fields |
Fields or columns where imputation was applied |
| informed_consent |
Details about informed consent procedures, including consent type, documentat... |
| instance_type |
The type or types of instances in the dataset (e |
| instances |
Individual data instances or records in the dataset |
| intended_uses |
Explicit intended and recommended uses for this dataset |
| inter_annotator_agreement |
Measure of agreement between annotators (e |
| inter_annotator_agreement_score |
Measured agreement between annotators (e |
| involves_human_subjects |
Does this dataset involve human subjects research? |
| ip_restrictions |
Intellectual property restrictions on dataset use or redistribution |
| irb_approval |
Was Institutional Review Board (IRB) approval obtained? Include approval numb... |
| is_data_split |
Is this subset a split of the larger dataset, e |
| is_deidentified |
De-identification status and procedures applied to the dataset |
| is_direct |
Whether collection was direct from individuals |
| is_identifier |
Indicates whether this variable serves as a unique identifier or key for reco... |
| is_random |
Indicates whether the sample is random |
| is_representative |
Indicates whether the sample is representative of the larger set |
| is_sample |
Indicates whether it is a sample of a larger set |
| is_sensitive |
Indicates whether this variable contains sensitive information (e |
| is_shared |
Boolean indicating whether the dataset is distributed to parties external to ... |
| is_subpopulation |
Is this subset a subpopulation of the larger dataset, e |
| is_tabular |
Whether the dataset is in tabular format (rows and columns) |
| issued |
Date of formal issuance or publication of the resource |
| keywords |
Keywords or tags describing the resource for discovery and classification |
| known_biases |
Known biases present in the dataset that may affect fairness, representativen... |
| known_limitations |
Known limitations of the dataset that may affect its use or interpretation |
| label |
Is there a label or target associated with each instance? |
| label_description |
If labeled, what pattern or format do labels follow? |
| labeling_details |
Free-text description of the labeling or annotation procedures, including ann... |
| labeling_strategies |
Labeling or annotation methodologies applied to the data |
| language |
Language in which the information is expressed |
| last_updated_on |
The date and time when the resource was most recently modified or updated |
| latest_version_doi |
DOI or URL identifying the latest version of this dataset (e |
| license |
The legal license under which the resource is made available (e |
| license_and_use_terms |
License and usage terms governing dataset access and use |
| license_terms |
Description of the dataset's license and terms of use, including links, costs... |
| limitation_description |
Detailed description of the limitation and its implications |
| limitation_type |
Category of limitation (e |
| machine_annotation_tools |
Automated annotation tools used in dataset creation |
| maintainer_details |
Free-text description of the organization, team, or individual responsible fo... |
| maintainers |
Individuals or organizations responsible for maintaining the dataset |
| maximum_value |
The maximum value that the variable can take |
| md5 |
MD5 hash value of the data (128-bit cryptographic hash) |
| measurement_technique |
The technique or method used to measure this variable |
| mechanism_details |
Free-text description of the specific mechanisms or procedures used to collec... |
| media_type |
The media type of the data |
| method |
Method used for de-identification (e |
| minimum_value |
The minimum value that the variable can take |
| missing |
Description of the missing data fields or elements |
| missing_data_causes |
Known or suspected causes of missing data (e |
| missing_data_documentation |
Documentation of missing data patterns and handling strategies |
| missing_data_patterns |
Description of patterns in missing data (e |
| missing_information |
References to one or more MissingInfo objects describing missing data |
| missing_value_code |
Code(s) used to represent missing values for this variable |
| mitigation_strategy |
Steps taken or recommended to mitigate this bias |
| modified_by |
A person or organization that contributed to modifying or updating the resour... |
| name |
A human-readable name for a thing |
| notification_details |
Free-text description of how individuals were notified about data collection,... |
| orcid |
ORCID (Open Researcher and Contributor ID) - a persistent digital identifier ... |
| other_compliance |
Other regulatory compliance frameworks applicable to this dataset (e |
| other_tasks |
Additional tasks the dataset may support beyond its original intent |
| page |
A landing page or web page providing access to or information about the resou... |
| parent_datasets |
Parent datasets that this dataset is part of or derived from |
| participant_compensation |
Information about compensation or incentives provided to human research parti... |
| participant_privacy |
Information about privacy protections and anonymization procedures for human ... |
| path |
The file path or URL where the content is located |
| precision |
The precision or number of decimal places for numeric variables |
| preprocessing_details |
Free-text description of preprocessing steps applied to the data, including t... |
| preprocessing_strategies |
Preprocessing steps applied to the raw data |
| principal_investigator |
A key individual (Principal Investigator) responsible for or overseeing datas... |
| privacy_techniques |
What privacy-preserving techniques were applied (e |
| prohibited_uses |
Explicitly prohibited or forbidden uses for this dataset |
| prohibition_reason |
One or more reasons why this use is prohibited (e |
| publisher |
The organization or entity responsible for making the resource available |
| purposes |
Purposes for which the dataset was created |
| quality_notes |
Notes about data quality, reliability, or known issues specific to this varia... |
| quote_char |
Character used for quoting fields (e |
| raw_data_details |
Free-text description of raw data availability, access procedures, and any co... |
| raw_data_format |
One or more formats of the raw data before any preprocessing (e |
| raw_data_sources |
List of raw data sources before preprocessing |
| raw_sources |
Raw, unprocessed source data before any preprocessing was applied |
| recommended_mitigation |
Recommended approaches for users to address this limitation |
| regulatory_compliance |
What regulatory frameworks govern this human subjects research (e |
| regulatory_restrictions |
Regulatory and export control restrictions applicable to the dataset |
| reidentification_risk |
What is the assessed risk of re-identification? What measures were taken to m... |
| related_datasets |
Related datasets with typed relationships (e |
| relationship_details |
Free-text description of how relationships between instances are represented ... |
| relationship_type |
The type of relationship (e |
| relationships |
Explicit relationships between individual instances in the dataset |
| release_dates |
One or more dates or timeframes for dataset release, in ISO 8601 format (e |
| repository_details |
Free-text description of the repository of known dataset uses, including how ... |
| repository_url |
URL to a repository of known dataset uses |
| representative_verification |
One or more explanations of how representativeness was validated or verified ... |
| resources |
Sub-resources or component items |
| response |
Short explanation describing the primary purpose of creating the dataset |
| restrictions |
One or more descriptions of restrictions or fees associated with accessing th... |
| retention_details |
Free-text description of applicable retention limits, legal or ethical basis ... |
| retention_limit |
Data retention policies and limits for the dataset |
| retention_period |
Time period for data retention |
| review_details |
Free-text description of the ethical review process, board decisions, outcome... |
| reviewing_organization |
Organization that conducted the ethical review (e |
| revocation_details |
Free-text description of the mechanism provided for individuals to revoke con... |
| role |
Role of the data collector (e |
| same_as |
One or more URLs or URIs identifying equivalent or related representations of... |
| sampling_strategies |
Strategies used to select data instances from a larger population |
| scope_impact |
How this limitation affects the scope or applicability of the dataset |
| sensitive_elements |
Sensitive data elements requiring special handling or access controls |
| sensitive_elements_present |
Indicates whether sensitive data elements are present |
| sensitivity_details |
Details on sensitive data elements present and handling procedures |
| sha256 |
SHA-256 hash value of the data (256-bit cryptographic hash, recommended) |
| source_data |
One or more descriptions of the larger sets from which the sample was drawn, ... |
| source_description |
Detailed description of where raw data comes from (e |
| source_type |
One or more types of raw source (e |
| special_populations |
Does the research involve any special populations that require additional pro... |
| special_protections |
What additional protections were implemented for at-risk populations? Include... |
| split_details |
Free-text description of the recommended data splits (e |
| splits |
Recommended data splits for this dataset |
| start_date |
Start date of data collection |
| status |
The status of the resource (e |
| strategies |
One or more sampling strategies used (e |
| subpopulation_elements_present |
Indicates whether any subpopulations are explicitly identified |
| subpopulations |
Subpopulations represented within the dataset |
| subsets |
Subsets or splits of this dataset |
| target_dataset |
The dataset that this relationship points to |
| task_details |
Free-text description of other potential tasks the dataset could support, inc... |
| tasks |
Tasks the dataset is intended to support |
| themes |
Themes associated with the data |
| third_party_sharing |
Third-party distribution policies for the dataset |
| timeframe_details |
Free-text description of the data collection period and whether this timefram... |
| title |
The official title of the element |
| tool_accuracy |
One or more known accuracy or performance metrics for the automated tools (if... |
| tool_descriptions |
Descriptions of what each tool does in the annotation process and what types ... |
| tools |
List of automated annotation tools with their versions |
| total_bytes |
Total size of all files in this collection, in bytes (integer) |
| total_file_count |
Total number of files across all file collections in this dataset |
| total_size_bytes |
Total size of all files in bytes across all file collections |
| unit |
The unit of measurement for the variable, preferably using QUDT units (http:/... |
| update_details |
Free-text description of planned update types (e |
| updates |
Plans for future updates or versioning of the dataset |
| url |
URL where the software can be found (e |
| usage_notes |
A note or caveat about using the dataset for its intended purposes |
| use_category |
One or more categories of intended use (e |
| use_repository |
Repositories or registries tracking how the dataset has been used |
| used_software |
What software was used as part of this dataset property? |
| variable_name |
The name or identifier of the variable as it appears in the data files |
| variables |
Metadata describing individual variables, fields, or columns in the dataset |
| version |
The version identifier of the resource (e |
| version_access |
Information about access to different versions of the dataset |
| version_details |
Free-text description of version support policies, how long older versions wi... |
| versions_available |
List of available versions with metadata |
| warnings |
One or more specific content warnings describing potentially offensive, insul... |
| was_derived_from |
A resource from which this resource was derived, in whole or in part |
| was_directly_observed |
True if the data was directly observed by a researcher or instrument; false i... |
| was_inferred_derived |
True if the data was computationally inferred or derived from other data (e |
| was_reported_by_subjects |
True if the data was self-reported directly by the subjects themselves (e |
| was_validated_verified |
True if the data underwent a validation or verification process (e |
| why_missing |
Explanation of why each piece of data is missing |
| why_not_representative |
One or more explanations of why the sample is not representative of the large... |
| withdrawal_mechanism |
How can participants withdraw their consent? What procedures are in place for... |