6 Module 4: Data Sharing and Resource Management (40 minutes)
Facilitator: Yulia Levites Strekalova, Clinical Care (CHoRUS)
Materials: - The FAIR Guiding Principles for scientific data management and stewardship (Paper) - Messy Data Silos Compromise Patient Privacy (Audio)
6.1 Content Block: Team Data Sharing Agreement Negotiation (30 minutes)
6.1.1 Scenario: Project “DeepHealth”
The Objective: A multi-site study to develop an AI-driven tool that predicts health risks by merging genomic markers with social determinants of health (SDOH).
6.1.1.1 The Stakeholders
The University (Genomics Lab): Focused on high-impact publications and “Open Science.” They operate under a federal grant requiring data to be Findable and Accessible in public repositories within 12 months of collection.
The Community Clinic (Patient Advocacy): Provides access to 5,000 high-risk patients. Their priority is Security and community trust. They are wary of “data extraction” and fear that sharing sensitive histories could stigmatize their patients or lead to insurance discrimination.
The Tech Partner (AI Solutions Corp): Provides a proprietary machine-learning platform. They want the data to train their models, but they want to keep the “processed” data and the resulting algorithms Secure and proprietary for commercial use.
6.1.1.2 The “Friction Points” (To be resolved)
The Interoperability Gap: The Clinic’s data is in narrative PDF format (notes); the University’s is in structured genomic files. The Tech Partner wants both converted to their proprietary format, which the others cannot access without a paid license.
The Timeline Conflict: The University wants to upload raw genomic data to a public server now to meet grant milestones. The Clinic wants a “Community Review Board” to approve every data release first.
The Reusability Dilemma: If the Tech Partner improves their AI model using this data, can the University use that improved model for their next project without paying?
6.2 Activity 4: Team Challenge – The “Team Data Agreement” (20 minutes)
6.2.1 Team Challenge
As a group, you must negotiate the “Team Data Agreement” You have 20 minutes to agree on:
- Ownership: Who “owns” the merged dataset?
- Access: Does the Clinic have the right to “veto” a data release?
- FAIRness: How will you make the data Interoperable without forcing everyone to buy the Tech Partner’s software?
This cheat sheet is designed to bridge the gap between high-level data theory and the “on-the-ground” reality of a research team. It breaks down the FAIR+ acronym through the lens of Team Science.***
6.3 🔬 The FAIR+ Cheat Sheet for Research Teams
6.3.1 A Guide to Collaborative Data Management
6.3.1.1 F — Findable
If you can’t find it, it doesn’t exist.
Metadata: Data about the data. Does the file name Dataset_V2_Final_ActualFinal.csv mean anything to a teammate six months from now?
Persistent Identifiers (PIDs): Assigning a DOI or a unique ID so the data can be cited and located regardless of where it’s moved.
Searchability: Is the data indexed in a place where your team (and the public, if required) can search for it?
6.3.1.2 A — Accessible
Knowing it exists isn’t the same as being able to open it.
Authentication: Who has the “key”? Define the specific protocols (e.g., VPN, login credentials) required to reach the data.
Long-term Availability: If the PI’s lab website goes down, where does the data live? (e.g., Institutional repositories like Zenodo or Dryad).
Open vs. Restricted: “Accessible” doesn’t always mean “Free for everyone.” It means the process for requesting access is clear and automated.
6.3.1.3 I — Interoperable
Data should play well with others.
Machine-Readable: Can a computer program (like an AI tool) ingest the data without a human having to manually re-format 1,000 PDFs?
Shared Vocabulary: Does everyone on the team agree on what “Age” means? (e.g., Is it “Age at birth,” “Age at enrollment,” or “Age in months”?)
Non-Proprietary Formats: Favoring .csv or .json over formats that require expensive, specialized software (like .sas7bdat or proprietary AI files).
6.3.1.4 R — Reusable
Build for the future, not just the deadline.
Documentation (The “Readme”): Does the data include a “User Manual” that explains the variables, the methodology, and the limitations?
Licensing: Using clear licenses (like Creative Commons) so others know exactly how they are allowed to use, credit, or remix your work.
6.3.1.5 + — The “Plus” (Secure & Ethical)
FAIR is great; safe is better.
De-identification: Ensuring human subject data is stripped of identifiers (HIPAA compliance).
Trust & Sovereignty: Respecting the rights of communities (like the Clinic in our scenario) to control how their data is used, even if the data is “technically” findable.
Sustainability: Ensuring the data remains secure and usable long after the grant funding ends.
The Team Science Tip: Data sharing is 20% technical and 80% cultural. Use these principles to build
Psychological Safety—when everyone knows the rules, they are more likely to share their best work.
6.4 Debrief (8 minutes)
Has your team identified the conflicts you didn’t realize existed? Who owns the data if the Tech Partner goes bankrupt?