CLI Reference
The Datasheets for Datasets workflow is exposed through the d4d command defined in pyproject.toml.
Installation and Invocation
Use the CLI from a repository checkout:
poetry install
poetry run d4d --help
After poetry install, the d4d entrypoint is available in the Poetry environment. While developing in the repo, poetry run d4d ... is the most reliable form.
Most subcommands assume they can import repo-local modules from src/ and .claude/agents/scripts/, so running inside a clone of data-sheets-schema is currently required.
Top-Level Commands
| Command | Purpose |
|---|---|
d4d download |
Download, preprocess, and concatenate source documents |
d4d evaluate |
Run datasheet evaluation workflows |
d4d render |
Render datasheets and evaluation outputs to HTML |
d4d rocrate |
Parse, merge, and transform RO-Crate metadata |
d4d schema |
Generate schema metrics and validate YAML against the schema |
d4d utils |
Inspect pipeline status and validate preprocessing results |
d4d download
d4d download sources
Download source documents from the project tracking sheet.
poetry run d4d download sources --project AI_READI
Options:
| Option | Description |
|---|---|
--project |
Required. One of AI_READI, CHORUS, CM4AI, VOICE |
--output-dir PATH |
Output directory for downloads. Default: data/raw |
d4d download preprocess
Normalize raw downloads into preprocessed text artifacts.
poetry run d4d download preprocess --project AI_READI
Options:
| Option | Description |
|---|---|
--project |
Optional. Restrict preprocessing to one project |
--input-dir PATH |
Raw download directory. Default: data/raw |
--output-dir PATH |
Preprocessed output directory. Default: data/preprocessed/individual |
d4d download concatenate
Concatenate one project's preprocessed files into a single text file.
poetry run d4d download concatenate --project AI_READI
Options:
| Option | Description |
|---|---|
--project |
Required. One of AI_READI, CHORUS, CM4AI, VOICE |
--input-dir PATH |
Preprocessed input directory. Default: data/preprocessed/individual |
--output-file PATH |
Output path. Default: data/preprocessed/concatenated/{PROJECT}_preprocessed.txt |
d4d evaluate
d4d evaluate presence
Run the presence-based evaluator across one project or all projects.
poetry run d4d evaluate presence --project AI_READI --method gpt5
Options:
| Option | Description |
|---|---|
--project |
Optional. Restrict evaluation to one project |
--method |
Generation method. One of curated, gpt5, claudecode, claudecode_agent, claudecode_assistant |
--output-dir PATH |
Evaluation output directory. Default: data/evaluation |
d4d evaluate llm
Run the LLM-based quality evaluator for a specific D4D YAML file.
poetry run d4d evaluate llm \
--file data/d4d_concatenated/gpt5/AI_READI_d4d.yaml \
--project AI_READI \
--method gpt5 \
--rubric both
Requires ANTHROPIC_API_KEY.
Options:
| Option | Description |
|---|---|
--file PATH |
Required. D4D YAML file to evaluate |
--project TEXT |
Required. Project name |
--method TEXT |
Required. Generation method |
--rubric |
rubric10, rubric20, or both. Default: both |
--output-dir PATH |
LLM evaluation output directory. Default: data/evaluation_llm |
d4d render
d4d render html
Render a structured input file to HTML.
poetry run d4d render html \
docs/yaml_output/concatenated/gpt5/AI_READI_d4d.yaml \
-o /tmp/AI_READI_d4d.html
Options:
| Option | Description |
|---|---|
INPUT_FILE |
Required positional argument. Structured input file |
-o, --output PATH |
Output HTML path. Default: a canonical name derived from the input filename and rubric |
--template |
human-readable, evaluation, or linkml. Default: human-readable |
Current behavior notes:
human-readablewrites to the exact output path you provide.- The CLI also copies
datasheet-common.cssinto the output directory so the generated HTML can be opened directly with styling intact. linkmlrenders a more technical LinkML-style HTML view from YAML or JSON input.evaluationrenders an evaluation JSON file and auto-detectsrubric10vsrubric20.
d4d render evaluation
Render evaluation JSON directly to HTML.
poetry run d4d render evaluation \
data/evaluation_llm/rubric10/concatenated/AI_READI_claudecode_agent_evaluation.json \
-o /tmp/AI_READI_evaluation.html
Options:
| Option | Description |
|---|---|
INPUT_FILE |
Required positional argument. Evaluation JSON file |
-o, --output PATH |
Output HTML path. Default: <input_file>.html |
--rubric |
auto, rubric10, or rubric20. Default: auto |
Naming convention notes:
- If you omit
-o, rubric10 outputs default to the canonical*_evaluation.htmlname. - If you omit
-o, rubric20 outputs default to*_evaluation_rubric20.htmlso they do not collide with rubric10 outputs.
d4d render generate-all
Show the bulk rendering workflow.
poetry run d4d render generate-all --method curated
Options:
| Option | Description |
|---|---|
--method |
Optional. One of gpt5, claudecode_agent, claudecode_assistant, curated |
This command currently prints instructions for bulk generation rather than rendering every file itself.
d4d rocrate
These commands depend on helper scripts under .claude/agents/scripts/.
d4d rocrate parse
Parse an RO-Crate JSON-LD file and optionally write the extracted entities to disk.
poetry run d4d rocrate parse path/to/ro-crate-metadata.json --output parsed.json
Options:
| Option | Description |
|---|---|
INPUT_FILE |
Required positional argument. RO-Crate JSON-LD file |
--output PATH |
Optional JSON output path |
d4d rocrate transform
Transform one RO-Crate or a merged set of RO-Crates into D4D YAML.
poetry run d4d rocrate transform path/to/ro-crate-metadata.json -o output.yaml
Options:
| Option | Description |
|---|---|
INPUT_FILE |
Required positional argument for single-file mode |
-o, --output PATH |
Required. Output D4D YAML path |
--merge |
Enable merge mode |
--inputs PATH |
Additional RO-Crate inputs for merge mode |
--primary PATH |
Primary RO-Crate for conflict resolution in merge mode |
d4d rocrate merge
Merge multiple RO-Crate files into one JSON document.
poetry run d4d rocrate merge crate1.json crate2.json -o merged.json
Options:
| Option | Description |
|---|---|
INPUT_FILES... |
Required positional arguments. One or more RO-Crate files |
-o, --output PATH |
Required. Output merged RO-Crate path |
--primary PATH |
Primary RO-Crate file for conflict precedence |
d4d schema
These commands also depend on helper scripts under .claude/agents/scripts/.
d4d schema stats
Generate metrics for the LinkML schema.
poetry run d4d schema stats --level 1 --format markdown
Options:
| Option | Description |
|---|---|
--level |
Detail level from 1 to 4. Default: 1 |
--format |
Output format: json, markdown, or csv. Default: markdown |
--output PATH |
Optional output file. Otherwise writes to stdout |
--schema-file PATH |
Override schema path. Default: src/data_sheets_schema/schema/data_sheets_schema_all.yaml |
d4d schema validate
Validate a D4D YAML file against the schema.
poetry run d4d schema validate docs/yaml_output/concatenated/gpt5/AI_READI_d4d.yaml
Options:
| Option | Description |
|---|---|
D4D_FILE |
Required positional argument. D4D YAML file to validate |
--schema-file PATH |
Override schema path. Default: src/data_sheets_schema/schema/data_sheets_schema_all.yaml |
d4d utils
d4d utils status
Show pipeline file counts.
poetry run d4d utils status --quick
Options:
| Option | Description |
|---|---|
--quick |
Show the compact view instead of the detailed breakdown |
d4d utils validate-preprocessing
Check the preprocessing output for empty or stub artifacts.
poetry run d4d utils validate-preprocessing --project AI_READI
Options:
| Option | Description |
|---|---|
--raw-dir PATH |
Raw data directory. Default: data/raw |
--preprocessed-dir PATH |
Preprocessed data directory. Default: data/preprocessed/individual |
--project |
Optional. Restrict validation to one project |
Recommended Starting Points
poetry run d4d --helpfor the top-level command listpoetry run d4d utils status --quickfor a quick pipeline sanity checkpoetry run d4d download preprocess --project AI_READIto start working on one projectpoetry run d4d evaluate presence --project AI_READI --method gpt5to generate evaluation output