CM4AI (Claude Code Synthesized)

Datasheet for Dataset - Human Readable Format

πŸ“Š

Composition

What do the instances represent?

  • Description
    RO-Crate packages (JSON metadata) and ZIP archives for imaging data.
  • Description
    2025-03-03
πŸ”

Collection Process

How was the data acquired?

Cell Maps for Artificial Intelligence - March 2025 Data Release (Beta)
Cell Maps for Artificial Intelligence - March 2025 Data Release (Beta)
This dataset is the March 2025 Data Release of Cell Maps for Artificial Intelligence (CM4AI; CM4AI.org), the Functional Genomics Grand Challenge in the NIH Bridge2AI program. This Beta release includes perturb-seq data in undifferentiated KOLF2.1J iPSCs; SEC-MS data in undifferentiated KOLF2.1J iPSCs and iPSC-derived NPCs, neurons, and cardiomyocytes; and IF images in MDA-MB-468 breast cancer cells in the presence and absence of chemotherapy (vorinostat and paclitaxel). CM4AI output data are packaged with provenance graphs and rich metadata as AI-ready datasets in RO-Crate format using the FAIRSCAPE framework. Data presented here will be augmented regularly through the end of the project. CM4AI is a collaboration of UCSD, UCSF, Stanford, UVA, Yale, UA Birmingham, Simon Fraser University, and the Hastings Center. This data is Copyright (c) 2025 The Regents of the University of California except where otherwise noted. Spatial proteomics raw image data is copyright (c) 2025 The Board of Trustees of the Leland Stanford Junior University. Dataset licensed for reuse under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/). Attribution is required to the copyright holders and the authors. Any publications referencing this data or derived products should cite the Related Publication below, as well as directly citing this data collection (2025-03-04).
2025-03-03
2025-02-27
  • CM4AI
  • AI
  • affinity purification
  • AP-MS
  • artificial intelligence
  • breast cancer
  • Bridge2AI
  • cardiomyocyte
  • CRISPR/Cas9
  • induced pluripotent stem cell
  • iPSC
  • KOLF2.1J
  • machine learning
  • mass spectroscopy
  • MDA-MB-468
  • neural progenitor cell
  • NPC
  • neuron
  • paclitaxel
  • perturb-seq
  • perturbation sequencing
  • protein-protein interaction
  • protein localization
  • single-cell RNA sequencing
  • scRNAseq
  • SEC-MS
  • size exclusion chromatography
  • subcellular imaging
  • vorinostat
  • Clark T (University of Virginia) - ORCID: https://orcid.org/0000-0003-4060-7360
  • Parker J (University of California, San Diego) - ORCID: https://orcid.org/0000-0003-4535-3486
  • Al Manir S (University of Virginia) - ORCID: https://orcid.org/0000-0003-4647-3877
  • Axelsson U (KTH Royal Institute of Technology,)
  • Ballllosero Navarro F (Stanford University) - ORCID: https://orcid.org/0000-0002-4180-422X
  • Chinn B (University of California San Diego)
  • Churas CP (University of California San Diego) https://orcid.org/0000-0001-9998-705X
  • Dailamy A (University of California, San Diego) - ORCID: https://orcid.org/0000-0002-6711-8260
  • Doctor Y (University of California, San Diego) - ORCID: https://orcid.org/0009-0009-0483-7506
  • Fall J (KTH - Royal Institute of Technology)
  • Forget A (University of California San Francisco) - ORCID: https://orcid.org/0000-0003-0223-0312
  • Gao J (University of California San Diego) - ORCID: https://orcid.org/0000-0002-6311-3526
  • Hansen JN (Stanford University) - ORCID: https://orcid.org/0000-0002-4650-9094
  • Hu M (University of California San Diego) https://orcid.org/0000-0002-1571-8029
  • Johannesson A (KTH - Royal Institute of Technology)
  • Khaliq H (University of California San Diego)
  • Lee YH (University of California San Diego) - ORCID: https://orcid.org/0000-0003-0917-355X
  • Lenkiewicz J (University of California San Diego) https://orcid.org/0000-0001-7252-8638
  • Levinson MA (University of Virginia) - ORCID: https://orcid.org/0000-0003-0384-8499
  • Marquez C (University of California San Diego) - ORCID: 0000-0003-3960-420X
  • Metallo C (University of California San Diego) - ORCID: https://orcid.org/0000-0003-2404-3040
  • Muralidharan M (University of California San Francisco)
  • Nourreddine S (University of California San Diego) https://orcid.org/0000-0003-3881-7588
  • Niestroy J (University of Virginia) - ORCID: https://orcid.org/0000-0002-1103-3882
  • Obernier K (University of California San Francisco) - ORCID: https://orcid.org/0000-0002-4025-1299
  • Pan E (University of California San Diego)
  • Polacco B (University of California San Francisco)
  • Pratt D (University of California San Diego) - ORCID: https://orcid.org/0000-0002-1471-9513
  • Qian G (University of California San Diego) - ORCID: https://orcid.org/0009-0005-4217-2745
  • Schaffer L (University of California San Diego) - ORCID: https://orcid.org/0000-0001-6339-9141
  • Sigaeva A (KTH Royal Institute of Technology) - ORCID: https://orcid.org/0000-0003-3361-3797
  • Thaker S (University of Alabama at Birmingham) - ORCID: https://orcid.org/0000-0001-6730-2773
  • Zhang Y (University of California San Diego)
  • BΓ©lisle-Pipon JC (Simon Fraser University) - ORCID: https://orcid.org/0000-0002-8965-8153
  • Brandt C (Yale University) - ORCID: https://orcid.org/0000-0001-8179-1796
  • Chen JY (The University of Alabama at Birmingham) - ORCID: https://orcid.org/0000-0002-6112-415X
  • Ding Y (University of Texas at Austin) - ORCID: https://orcid.org/0000-0003-2567-2009
  • Fodeh S (Yale University) - ORCID: https://orcid.org/0000-0003-4664-3143
  • Krogan N (University of California San Francisco) - ORCID: https://orcid.org/0000-0003-4902-337X
  • Lundberg E (Stanford University) - ORCID: https://orcid.org/0000-0001-7034-0850
  • Mali P (University of California San Diego) https://orcid.org/0000-0002-3383-1287
  • Payne-Foster P (University of Alabama) - ORCID: https://orcid.org/0000-0002-3508-3577
  • Ratcliffe S (University of Virginia) - ORCID: https://orcid.org/0000-0002-6644-8284
  • Ravitsky V (University of Montreal) - ORCID: https://orcid.org/0000-0002-7080-8801
  • Sali A (University of California San Diego) - ORCID: https://orcid.org/0000-0003-0435-6197
  • Schulz W (Yale University) - ORCID: https://orcid.org/0000-0002-2048-4028
  • Ideker T (University of California San Diego) - ORCID: https://orcid.org/0000-0002-1708-8454
RoleNameORCIDAffiliation
Contributorro-crate-metadata.jsonCRISPR Perturbation Cell Atlas/ro-crate-metadata.json-
Contributorro-crate-metadata.jsonCRISPR Perturbation RNA Sequences - Raw Sequences/ro-crate-metadata.json-
Contributorcm4ai-v0.6-beta-if-images-untreated.zipProtein Localization Subcellular Images/cm4ai-v0.6-beta-if-images-untreated.zip-
Contributorcm4ai-v0.6-beta-if-images-paclitaxel.zipProtein Localization Subcellular Images/cm4ai-v0.6-beta-if-images-paclitaxel.zip-
Contributorcm4ai-v0.6-beta-if-images-vorinostat.zipProtein Localization Subcellular Images/cm4ai-v0.6-beta-if-images-vorinostat.zip-
Contributorro-crate-metadata.jsonProtein-protein Interaction SEC-MS/ro-crate-metadata.json-
Description
  • Copyright (c) 2025 The Regents of the University of California except where otherwise noted.
  • Spatial proteomics raw image data: Copyright (c) 2025 The Board of Trustees of the Leland Stanford Junior University.
  • Description
    • Hosted/maintained by University of Virginia Dataverse (LibraData).
    • Point of Contact listed: Trey Ideker (University of California San Diego); contact via dataset page.
πŸš€

Uses

What (other) tasks could the dataset be used for?

Description
Clark T, et al. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from...
Nourreddine S, et al. A PERTURBATION CELL ATLAS OF HUMAN INDUCED PLURIPOTENT STEM CELLS. bioRxiv. 20...
Description
  • Dataset is licensed under CC BY-NC-SA 4.0; attribution required to copyright holders and authors.
  • Cite the related publication(s) and this data collection.
  • Description
    Dataset available via University of Virginia Dataverse (public files; large dataset guidance provided).
πŸ“€

Distribution

How will the dataset be distributed?

CC BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/)
πŸ”„

Maintenance

How will the dataset be maintained?

1.4
Description
  • Data will be augmented regularly through the end of the project.
Generated on 2025-11-16 17:37:50 using Bridge2AI Data Sheets Schema