Subset: Preprocessing-Cleaning-Labeling
The questions in this section are intended to provide dataset consumers with the information they need to determine whether the “raw” data has been processed in ways that are compatible with their chosen tasks.
URI: Preprocessing-Cleaning-Labeling
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/bridge2ai/data-sheets-schema
Classes in subset
Class | Description |
---|---|
CleaningStrategy | Was any cleaning of the data done (e |
LabelingStrategy | Was any preprocessing/cleaning/labeling of the data done (e |
PreprocessingStrategy | Was any preprocessing of the data done (e |
RawData | Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data... |
CleaningStrategy
Was any cleaning of the data done (e.g., removal of instances, processing of missing values)?
LabelingStrategy
Was any preprocessing/cleaning/labeling of the data done (e.g., part-of-speech tagging)?
PreprocessingStrategy
Was any preprocessing of the data done (e.g., discretization or bucketing, tokenization, SIFT feature extraction)?
RawData
Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)? If so, please provide a link or other access point to the “raw” data.