6 Module 4: Data Sharing and Resource Management (40 minutes)
Facilitator: Yulia Levites Strekalova, Clinical Care (CHoRUS)
6.1 Learning Objectives
By the end of this module, participants will be able to:
- Describe the five FAIR+ principles and give a concrete example of each.
- Identify the primary stakeholder interests and potential friction points in a multi-party data sharing agreement.
- Negotiate the key elements of a basic Team Data Agreement (ownership, access, interoperability, reuse).
- Explain why data governance is primarily a social and cultural challenge, not just a technical one.
6.2 Module Overview
Data is the currency of modern research, and disputes over data — who owns it, who can access it, how it can be reused — are among the most common and consequential conflicts in large collaborations. These disputes are not usually caused by bad actors. They are caused by well-meaning partners who made incompatible assumptions early in a project and only discovered the conflict when the stakes were high. The University team assumed “open science” meant everything goes public; the clinic assumed “partnership” meant they had veto power over releases. Neither assumption was wrong — they were just never reconciled.
The FAIR+ principles (Findable, Accessible, Interoperable, Reusable, plus Secure & Ethical) provide a shared vocabulary for negotiating these tensions before they become crises. But as the “Team Science Tip” in this module notes, data sharing is 20% technical and 80% cultural. Understanding FAIR is necessary but not sufficient. What matters is whether a team has had the explicit conversation about what each principle means for their specific data, stakeholders, and constraints — and documented the result in a shared agreement. This module uses a realistic multi-stakeholder scenario to practice exactly that conversation.
6.3 Participant Background Reading
Participants are encouraged to review the following before the session. The pre-read and pre-listen materials are already linked in the Module Materials section below.
Wilkinson, M. D., et al. (2016). “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data, 3, 160018. The foundational paper that defined the FAIR framework. Before the session, read the abstract, the four principles section, and the discussion. Focus on understanding why each principle was considered necessary — the problems it was designed to solve — rather than memorizing the definitions. The PDF is linked in the Module Materials.
“Messy Data Silos Compromise Patient Privacy” (Audio). A short audio piece illustrating what goes wrong when data governance is left implicit. Listen before the session; it provides real-world texture for the DeepHealth scenario. Linked in the Module Materials.
Carroll, S. R., et al. (2020). “The CARE Principles for Indigenous Data Governance.” Data Science Journal, 19(1), 43. The CARE principles (Collective Benefit, Authority to Control, Responsibility, Ethics) extend the FAIR framework to address the rights of communities — particularly Indigenous communities — to govern how data about them is collected and used. Read the abstract and the principles section to understand the “Trust & Sovereignty” component of FAIR+. Freely available online.
6.4 Instructor Notes
6.4.1 Conceptual Background
Why FAIR alone is not enough. The FAIR principles were designed primarily to address technical interoperability — making it easier for machines to find, access, and use data across systems. They are essential but do not directly address the power dynamics of data ownership, the rights of data subjects (patients, communities), or long-term ethical obligations. The “+” in FAIR+ extends the framework to cover these dimensions. Instructors should be prepared to explain this distinction if participants ask why FAIR+ adds “Secure & Ethical” rather than treating security and ethics as implicit.
The CARE principles. CARE is a framework developed by Indigenous data governance scholars specifically to address the gap between FAIR’s technical orientation and the political reality of data sovereignty. It is relevant to any situation in which data is about a community rather than generated by that community — which includes health data from patient populations, as in the DeepHealth scenario. Instructors do not need deep expertise in CARE to reference it, but should be able to explain why the Clinic’s concern about “data extraction” in the scenario is a legitimate data sovereignty concern, not just bureaucratic obstruction.
Data Use Agreements and IRB. Some participants may be unfamiliar with the formal legal and ethical infrastructure around health data sharing: - A Data Use Agreement (DUA) is a legal contract specifying the terms under which one party shares data with another, including permitted uses, security requirements, and return or destruction of data. - An Institutional Review Board (IRB) is a committee that reviews research protocols involving human subjects to ensure ethical compliance. In multi-site research, each site typically has its own IRB and requires its own approval. The Clinic’s “Community Review Board” in the scenario is an analogous governance mechanism. - HIPAA (Health Insurance Portability and Accountability Act) is U.S. federal law governing the privacy and security of health information. De-identification refers to the process of removing identifying information from data to comply with HIPAA (or equivalent regulations in other jurisdictions).
Facilitating the negotiation activity. The DeepHealth scenario is designed to surface genuine tensions. Common dynamics to watch for: - One stakeholder group may “win” the negotiation too easily — this often means the group isn’t taking the constraints seriously. Encourage each role to advocate for their actual interests. - Participants may default to technical solutions (e.g., “we’ll just encrypt it”) when the real problem is a governance question (e.g., “who decides what counts as acceptable access?”). Redirect toward the governance question. - The debrief question “Who owns the data if the Tech Partner goes bankrupt?” is deliberately provocative — it surfaces that ownership and access need to be specified contractually, not just assumed.
6.4.2 Key Concepts
- FAIR principles: A set of four data management principles — Findable, Accessible, Interoperable, Reusable — designed to make scientific data more useful across systems and over time.
- CARE principles: A complementary framework to FAIR focusing on data governance from the perspective of data subjects and communities: Collective Benefit, Authority to Control, Responsibility, Ethics.
- Data Use Agreement (DUA): A legal contract specifying the terms under which one party shares data with another.
- IRB (Institutional Review Board): A committee that reviews research protocols involving human subjects for ethical compliance. Multi-site studies require IRB approval at each participating institution.
- De-identification: The process of removing or obscuring identifying information from data to protect individual privacy (e.g., HIPAA compliance in the U.S.).
- Data sovereignty: The principle that communities or nations have rights to govern how data about their members is collected, stored, and used — particularly relevant for Indigenous communities and historically marginalized patient populations.
- Provenance: Documentation of where data came from, how it was collected, and what transformations it has undergone; essential for both reproducibility and accountability.
- Persistent identifier (PID): A long-term reference (such as a DOI) that uniquely identifies a dataset and remains stable even if the data’s location changes.
6.4.3 Recommended Instructor Reading
- Wilkinson, M. D., et al. (2016). “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data, 3, 160018. The foundational paper — instructors should read in full.
- Carroll, S. R., et al. (2020). “The CARE Principles for Indigenous Data Governance.” Data Science Journal, 19(1), 43. Essential context for the “Trust & Sovereignty” dimension of FAIR+.
- Borgman, C. L. (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. MIT Press. Chapters 1 and 6 are especially relevant — provides rich context for why data sharing is culturally difficult in academic science.
- NIH Data Management and Sharing Policy (2023). The current U.S. federal policy requiring data management plans for NIH-funded research. Instructors working with NIH-funded participants should be familiar with the basics.
Materials: - The FAIR Guiding Principles for scientific data management and stewardship (Paper) - Messy Data Silos Compromise Patient Privacy (Audio)
6.5 Content Block: Team Data Sharing Agreement Negotiation (30 minutes)
6.5.1 Scenario: Project “DeepHealth”
The Objective: A multi-site study to develop an AI-driven tool that predicts health risks by merging genomic markers with social determinants of health (SDOH).
6.5.1.1 The Stakeholders
The University (Genomics Lab): Focused on high-impact publications and “Open Science.” They operate under a federal grant requiring data to be Findable and Accessible in public repositories within 12 months of collection.
The Community Clinic (Patient Advocacy): Provides access to 5,000 high-risk patients. Their priority is Security and community trust. They are wary of “data extraction” and fear that sharing sensitive histories could stigmatize their patients or lead to insurance discrimination.
The Tech Partner (AI Solutions Corp): Provides a proprietary machine-learning platform. They want the data to train their models, but they want to keep the “processed” data and the resulting algorithms Secure and proprietary for commercial use.
6.5.1.2 The “Friction Points” (To be resolved)
The Interoperability Gap: The Clinic’s data is in narrative PDF format (notes); the University’s is in structured genomic files. The Tech Partner wants both converted to their proprietary format, which the others cannot access without a paid license.
The Timeline Conflict: The University wants to upload raw genomic data to a public server now to meet grant milestones. The Clinic wants a “Community Review Board” to approve every data release first.
The Reusability Dilemma: If the Tech Partner improves their AI model using this data, can the University use that improved model for their next project without paying?
6.6 Activity 4: Team Challenge – The “Team Data Agreement” (20 minutes)
6.6.1 Team Challenge
As a group, you must negotiate the “Team Data Agreement” You have 20 minutes to agree on:
- Ownership: Who “owns” the merged dataset?
- Access: Does the Clinic have the right to “veto” a data release?
- FAIRness: How will you make the data Interoperable without forcing everyone to buy the Tech Partner’s software?
This cheat sheet is designed to bridge the gap between high-level data theory and the “on-the-ground” reality of a research team. It breaks down the FAIR+ acronym through the lens of Team Science.
6.7 🔬 The FAIR+ Cheat Sheet for Research Teams
6.7.1 A Guide to Collaborative Data Management
6.7.1.1 F — Findable
If you can’t find it, it doesn’t exist.
Metadata: Data about the data. Does the file name Dataset_V2_Final_ActualFinal.csv mean anything to a teammate six months from now?
Persistent Identifiers (PIDs): Assigning a DOI or a unique ID so the data can be cited and located regardless of where it’s moved.
Searchability: Is the data indexed in a place where your team (and the public, if required) can search for it?
6.7.1.2 A — Accessible
Knowing it exists isn’t the same as being able to open it.
Authentication: Who has the “key”? Define the specific protocols (e.g., VPN, login credentials) required to reach the data.
Long-term Availability: If the PI’s lab website goes down, where does the data live? (e.g., Institutional repositories like Zenodo or Dryad).
Open vs. Restricted: “Accessible” doesn’t always mean “Free for everyone.” It means the process for requesting access is clear and automated.
6.7.1.3 I — Interoperable
Data should play well with others.
Machine-Readable: Can a computer program (like an AI tool) ingest the data without a human having to manually re-format 1,000 PDFs?
Shared Vocabulary: Does everyone on the team agree on what “Age” means? (e.g., Is it “Age at birth,” “Age at enrollment,” or “Age in months”?)
Non-Proprietary Formats: Favoring .csv or .json over formats that require expensive, specialized software (like .sas7bdat or proprietary AI files).
6.7.1.4 R — Reusable
Build for the future, not just the deadline.
Documentation (The “Readme”): Does the data include a “User Manual” that explains the variables, the methodology, and the limitations?
Licensing: Using clear licenses (like Creative Commons) so others know exactly how they are allowed to use, credit, or remix your work.
6.7.1.5 + — The “Plus” (Secure & Ethical)
FAIR is great; safe is better.
De-identification: Ensuring human subject data is stripped of identifiers (HIPAA compliance).
Trust & Sovereignty: Respecting the rights of communities (like the Clinic in our scenario) to control how their data is used, even if the data is “technically” findable.
Sustainability: Ensuring the data remains secure and usable long after the grant funding ends.
The Team Science Tip: Data sharing is 20% technical and 80% cultural. Use these principles to build
Psychological Safety—when everyone knows the rules, they are more likely to share their best work.
6.8 Debrief (8 minutes)
Has your team identified the conflicts you didn’t realize existed? Who owns the data if the Tech Partner goes bankrupt?
Module Materials