All Use Cases

use_cases

use_case_category	known_limitations	relevance_to_dgps	data_topics	standards_and_tools_for_dgp_use	alternative_standards_and_tools	enables	involved_in_experimental_design	involved_in_metadata_management	involved_in_quality_control	xref	id	category	name	description	contributor_name	contributor_github_name	contributor_orcid
acquisition		aireadi chorus voice	B2AI_TOPIC:15 B2AI_TOPIC:4	B2AI_STANDARD:98 B2AI_STANDARD:243		B2AI_USECASE:5 B2AI_USECASE:13 B2AI_USECASE:17 B2AI_USECASE:19	True	True	False		B2AI_USECASE:1	B2AI_USECASE:UseCase	Obtain patient data from records of clinical visits.	Collecting clinical data from patient visits involves the process of gathering information about a patient's medical history, current symptoms, and other relevant information during a healthcare appointment. This typically includes taking a detailed medical history, conducting a physical examination, ordering and interpreting diagnostic tests, and documenting the findings in the patient's medical record. This may also include more focused evaluations, as with the AI-READI project’s assessments of cognitive function and visual acuity. Medical records may include structured/unstructured text, values for lab results, and/or images.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi	B2AI_TOPIC:4 B2AI_TOPIC:22	B2AI_STANDARD:33		B2AI_USECASE:19	True	True	False		B2AI_USECASE:2	B2AI_USECASE:UseCase	Obtain image data from brain magnetic resonance imaging.	Magnetic resonance imaging (MRI) is a medical imaging technique that produces detailed images of the body's internal structures, including the brain. These images can be used to diagnose a variety of medical conditions and to evaluate the health of the brain. Brain MRI image data refers to the detailed images of the brain that are produced by the MRI machine.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi	B2AI_TOPIC:9 B2AI_TOPIC:10	B2AI_STANDARD:202		B2AI_USECASE:18	True	True	False		B2AI_USECASE:3	B2AI_USECASE:UseCase	Obtain clinical waveform data from patients.	Clinical waveform data from an electrocardiogram (EKG or ECG) is a representation of the electrical activity of the heart. The EKG measures the voltage between different points on the body and records the resulting waveform. This waveform can be used to diagnose a variety of heart conditions, including arrhythmias and heart attacks. It is typically recorded using a machine that is attached to the patient via electrodes.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi	B2AI_TOPIC:4 B2AI_TOPIC:24	B2AI_STANDARD:98		B2AI_USECASE:19 B2AI_USECASE:26	True	True	False		B2AI_USECASE:4	B2AI_USECASE:UseCase	Obtain image data from retinal and other ophthalmic imaging.	Ophthalmic image data is data that is collected from images of the eye. This type of data is typically used in the field of ophthalmology, which is the branch of medicine that deals with the diagnosis and treatment of eye diseases and disorders. Ophthalmic images can provide valuable information about the health of the eye, including the structure and function of the various parts of the eye, such as the retina, cornea, and lens.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi	B2AI_TOPIC:4 B2AI_TOPIC:9 B2AI_TOPIC:18	B2AI_STANDARD:243		B2AI_USECASE:17	True	True	False		B2AI_USECASE:5	B2AI_USECASE:UseCase	Obtain patient data from laboratory analysis, including serological testing and urinalysis.	Patient data from laboratory analysis typically includes results from tests that have been performed on samples taken from the patient, such as blood, urine, or other bodily fluids. Serological testing is a type of laboratory analysis that involves testing blood serum (the liquid part of blood) for the presence of various indicators of disease or health. Urinalysis is another common type of laboratory analysis that involves testing urine samples for various factors, such as the presence of bacteria, glucose, or other substances. Test results may or may not be derived from EHR data. Here, data includes records from mobile devices used to complement laboratory diagnostics, including continuous glucose monitoring.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi chorus	B2AI_TOPIC:18	B2AI_STANDARD:246		B2AI_USECASE:17 B2AI_USECASE:26 B2AI_USECASE:28	True	True	False		B2AI_USECASE:6	B2AI_USECASE:UseCase	Obtain patient data from wearable devices.	Wearable devices are small electronic devices that can be worn on the body to collect data about the user's activity, movements, and other physiological information. This data can include things like steps taken, heart rate, sleep patterns, and other metrics that can be used to track health and fitness. Activity data may be viewed in aggregate (e.g., number of steps per day above a threshold rather than exact counts or geolocation data) to serve as approximates of physical fitness assessments otherwise performed by clinical personnel.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi voice	B2AI_TOPIC:13 B2AI_TOPIC:35	B2AI_STANDARD:36 B2AI_STANDARD:154 B2AI_STANDARD:819 B2AI_STANDARD:278 B2AI_STANDARD:299 B2AI_STANDARD:301		B2AI_USECASE:20 B2AI_USECASE:26 B2AI_USECASE:28 B2AI_USECASE:29	True	True	False	EDAM:topic_3673	B2AI_USECASE:7	B2AI_USECASE:UseCase	Obtain genomics data from patients.	Clinical genomics data refers to the genetic information collected from individuals as part of their medical care or clinical research. This data may include information about an individual's DNA sequence, as well as any genetic variations or mutations that may be associated with disease phenotypes.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		voice	B2AI_TOPIC:31 B2AI_TOPIC:36	B2AI_STANDARD:732 B2AI_STANDARD:723 B2AI_STANDARD:821 B2AI_STANDARD:839		B2AI_USECASE:13 B2AI_USECASE:22 B2AI_USECASE:27 B2AI_USECASE:31	True	True	False		B2AI_USECASE:8	B2AI_USECASE:UseCase	Obtain voice data from patients.	Perform voice data collection, either in a clinical setting or through a mobile device. Includes a process for patients to consent to voice data collection, voice data sharing and utilization as part of voice AI technology.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		aireadi chorus	B2AI_TOPIC:29 B2AI_TOPIC:31	B2AI_STANDARD:243		B2AI_USECASE:17 B2AI_USECASE:26 B2AI_USECASE:28 B2AI_USECASE:29	True	True	False		B2AI_USECASE:9	B2AI_USECASE:UseCase	Obtain social determinants of health data from patients.	Social determinants of health (SDoH) are the conditions in which people are born, grow, live, work, and age. These conditions are shaped by the distribution of money, power, and resources at global, national, and local levels. SDoH are largely responsible for health inequities - unfair and avoidable differences in health status from person to person. SDoH data may be collected directly from individuals or based on integration with other data, but it generally includes at least one of the following factors poverty and income inequality, education and literacy, employment and working conditions, gender and gender equality, social exclusion and discrimination, housing and living conditions, or access to healthcare.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		cm4ai	B2AI_TOPIC:19	B2AI_STANDARD:764		B2AI_USECASE:16	True	True	False		B2AI_USECASE:10	B2AI_USECASE:UseCase	Obtain molecular proximity observations from microscopy images of human cells.	Images of objects at the microscale (i.e., those at 0.1–100μm) are obtained through a variety of microscopy approaches. In the CM4AI project, images of cell structures are obtained through confocal immunofluorescence microscopy.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		cm4ai	B2AI_TOPIC:28	B2AI_STANDARD:764		B2AI_USECASE:16	True	True	False	EDAM:topic_0121	B2AI_USECASE:11	B2AI_USECASE:UseCase	Obtain proteome data from human cell samples.	The proteome is the complete set of proteins that is expressed by a genome, cell, tissue, or organism at a given time and set of conditions. Proteome data refers to the information that is generated from studies of the proteome, such as the identification and characterization of the proteins that are expressed, their relative abundance, and any modifications that they may undergo. In the CM4AI project, proteome data is obtained through affinity purification coupled with tandem mass spectroscopy (AP-MS/MS).	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
acquisition		cm4ai	B2AI_TOPIC:34	B2AI_STANDARD:764		B2AI_USECASE:16	True	True	False	EDAM:topic_3170	B2AI_USECASE:12	B2AI_USECASE:UseCase	Obtain transcriptome data from human cell populations perturbed through CRISPR-driven mutagenesis.	In the CM4AI project, transcriptome data is collected through single-cell RNA sequencing from cells subjected to CRISPR-driven mutagenesis.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
integration		voice	B2AI_TOPIC:9 B2AI_TOPIC:36	B2AI_STANDARD:732 B2AI_STANDARD:109 B2AI_STANDARD:271		B2AI_USECASE:17	True	True	False		B2AI_USECASE:13	B2AI_USECASE:UseCase	Integrate clinical record data with voice data.	Clinical records generally do not include mechanisms for accessing voice recordings. Data records must therefore be linked to associate voice data samples with their source patients.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
integration		chorus	B2AI_TOPIC:4	B2AI_STANDARD:775 B2AI_STANDARD:243			True	True	False		B2AI_USECASE:14	B2AI_USECASE:UseCase	Transform data from OMOP to the i2b2 standard.	Transforming data from OMOP to the i2b2 standard involves converting the data from OMOP's schema to the i2b2 schema, which allows for the data to be more easily queried and analyzed using i2b2's tools and platforms. This involves mapping the data to equivalent concepts in the i2b2 schema, and may also involve cleaning and preprocessing the data to ensure that it is in the correct format for use with i2b2.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
integration		chorus	B2AI_TOPIC:5	B2AI_STANDARD:378			True	True	False		B2AI_USECASE:15	B2AI_USECASE:UseCase	Produce artifacts that map identifiers between source and standardized data representations.	The sets of identifiers shared between two or more data products serve as points of commonality between observations, but in practice, a desired level of interoperability may not be achievable without mapping some identifiers to equivalent terms. This may be necessary for entire namespaces (e.g., mapping all NCBI Gene identifiers to their corresponding UniProtKB protein accessions) or for a subset (e.g., mapping ChEBI entries for drugs to their identifiers in a drug-centric knowledge base). There may also be a need to define inexact matches: an identifier’s best mapping in another resource may be to a more broadly-defined concept.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
integration		cm4ai	B2AI_TOPIC:19 B2AI_TOPIC:28 B2AI_TOPIC:27 B2AI_TOPIC:34	B2AI_STANDARD:764		B2AI_USECASE:24	True	True	True		B2AI_USECASE:16	B2AI_USECASE:UseCase	Link cellular objects to functions through associations between proteins, cell structure proximity, and transcriptomics.	As per Qin et al. (2021) Nature (https://doi.org/10.1038/s41586-021-04115-9), imaging data and biophysical association data may be combined to develop measurements of protein distance within subcellular systems. This use case builds on that strategy by adding a third component: measurement of transcript changes under perturbation conditions for each protein. For the CM4AI DGP, this process involves evidence graphs. The result here is not a full subcellular map, but rather the integrated data necessary to assemble such a map.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		aireadi chorus voice	Demographic B2AI_TOPIC:4 B2AI_TOPIC:9 B2AI_TOPIC:18 B2AI_TOPIC:29 B2AI_TOPIC:31	B2AI_STANDARD:71 B2AI_STANDARD:187 B2AI_STANDARD:788 B2AI_STANDARD:243 B2AI_STANDARD:271 B2AI_STANDARD:727		B2AI_USECASE:26 B2AI_USECASE:28 B2AI_USECASE:29	False	True	True		B2AI_USECASE:17	B2AI_USECASE:UseCase	Standardize clinical record data collected from multiple sites and sources.	Standardizing clinical record data across multiple sites and sources involves several steps, all with the goal of rendering it more usable in subsequent analyses. Data is first collected from electronic health records, clinical databases, surveys, and potentially other sources. Next, the data is cleaned and transformed to a consistent format. This may include removing duplicate records, filling in missing data, and standardizing field names and values. The data is then validated to ensure that it is accurate and complete. Standardized data is then integrated into a central repository.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		aireadi chorus	B2AI_TOPIC:9 B2AI_TOPIC:10	B2AI_STANDARD:788 B2AI_STANDARD:202			False	True	True		B2AI_USECASE:18	B2AI_USECASE:UseCase	Standardize clinical waveform data collected from multiple sites and sources.	As with other clinical observations and records, standardizing waveform data is largely a process of collection, cleaning, transformation, validation, and storage. The features of waveforms, whether audio or cardiac in origin, require specific handling to ensure physiologically-relevant details are retained. Raw waveform data is quite large and therefore consumes more disk space than most database architectures are prepared to operate with. This data may be collected continuously, leading to accumulation of large quantities of incoming observations to store and analyze. standardization must therefore be sensitive to the size and resolution of waveform data.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		aireadi chorus voice	B2AI_TOPIC:15 B2AI_TOPIC:24	B2AI_STANDARD:71 B2AI_STANDARD:98 B2AI_STANDARD:788		B2AI_USECASE:25 B2AI_USECASE:30	False	True	True		B2AI_USECASE:19	B2AI_USECASE:UseCase	Standardize clinical image data collected from multiple sites and sources.	Standardizing clinical images across sites and sources involves ensuring that images are captured and stored in a consistent manner, so that they can be easily compared and analyzed. This may include following guidelines for image acquisition, e.g., recommended image resolution, contrast, and lighting conditions. It also includes image metadata properties such as consistent labeling and format. Depending on subsequent applications, it may require image processing, such as normalization, to correct for variations in appearance due to differences in equipment or patient positioning.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		aireadi voice	B2AI_TOPIC:13	B2AI_STANDARD:109			False	True	True		B2AI_USECASE:20	B2AI_USECASE:UseCase	Standardize clinical omics data collected from multiple sites and sources.	Standardizing clinical omics data involves methods for ensuring consistency and comparability across different sources. This can be achieved through common data formats, controlled vocabularies, and ontologies. It may also involve some degree of quality control, as data from multiple sources may not be subject to identical validation or filtering procedures.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		cm4ai	B2AI_TOPIC:19 B2AI_TOPIC:27 B2AI_TOPIC:28 B2AI_TOPIC:34				False	True	True		B2AI_USECASE:21	B2AI_USECASE:UseCase	Assemble standards for integrated maps of human cell architecture.	As per Qin et al. (2021) Nature (https://doi.org/10.1038/s41586-021-04115-9), imaging data and biophysical association data may be combined to develop measurements of protein distance within subcellular systems. When combined with other data (see B2AI_USECASE:16) the result is a map of cell architecture. Some degree of standardization will be necessary among these maps, such that they may be combined while retaining consistent biologically-relevant observations.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		voice	B2AI_TOPIC:36			B2AI_USECASE:31 B2AI_USECASE:32	False	True	True		B2AI_USECASE:22	B2AI_USECASE:UseCase	Assemble standards for voice data.	Standardizing voice recordings involves ensuring that all recordings have consistent properties, including volume, equalization, and noise reduction. These standards also incorporate processes for storing metadata about recorded voice samples.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
standardization		cm4ai	B2AI_TOPIC:5	B2AI_STANDARD:444			False	True	True		B2AI_USECASE:23	B2AI_USECASE:UseCase	Construct standards for computational provenance.	Computational provenance is a record of the processes and data used to produce a computational result. Constructing standards for computational provenance involves establishing protocols for how this information should be recorded, stored, and shared. This can include what information provenance records must contain, their format(s), and any connections to the resulting computation. Additionally, standards may be established for how provenance should be validated and authenticated to ensure its accuracy and trustworthiness.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
modeling		cm4ai	B2AI_TOPIC:19 B2AI_TOPIC:27 B2AI_TOPIC:28 B2AI_TOPIC:34				True	True	False		B2AI_USECASE:24	B2AI_USECASE:UseCase	Develop multi-scale maps of human cell architecture.	Given the availability of integrated imaging, biophysical, and transcriptome data centered on a specific set of proteins (see B2AI_USECASE:16), we may then use these results to assemble maps of the physical proximities and relationships among those proteins.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
modeling		chorus	B2AI_TOPIC:15	B2AI_STANDARD:788		B2AI_USECASE:30	True	True	False		B2AI_USECASE:25	B2AI_USECASE:UseCase	Develop models of clinical image data.	Developing models of clinical image data may involve annotation, preprocessing, and model training. Generally, annotation requires labeling images with disease or clinical phenotype-relevant information such as labels, bounding boxes, and segmentation masks. The annotation process may be assisted by automated methods, particularly in cases where patient features are already known. Preprocessing such as resizing, normalization, and data augmentation then prepares the labeled images for model training. The training process applies machine learning algorithms to learn patterns from the data and make predictions on new images.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
modeling		aireadi	B2AI_TOPIC:4 B2AI_TOPIC:10 B2AI_TOPIC:13 B2AI_TOPIC:18 B2AI_TOPIC:24 B2AI_TOPIC:29				True	True	False		B2AI_USECASE:26	B2AI_USECASE:UseCase	Develop pseudotime patient models of health and salutogenesis.	As presented by the AI-READI DGP, developing pseudotime patient models of health and salutogenesis hinges on the idea that health is a time-sensitive process, with various events contributing to a progression towards clinical outcomes in a chronological fashion. The exact amount of time between events may not be as important as their order and may therefore be abstracted.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application		voice	B2AI_TOPIC:4 B2AI_TOPIC:31 B2AI_TOPIC:36	B2AI_STANDARD:723 B2AI_STANDARD:839			True	True	True		B2AI_USECASE:27	B2AI_USECASE:UseCase	Deploy a Federated Learning System for analysis of voice data.	As presented by the Voice DGP, a set of patient voice recordings may be analyzed in an automated manner through the machine learning approach of federated learning. In federated learning, a central model is first trained on a dataset distributed among many devices or clients. Individual clients train their own copies of the model on their own data, then send updated model parameters to the central server. The server then averages newly updated parameters from all clients to produce a new global model. It returns the new model to the clients and the process repeats.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application		aireadi	B2AI_TOPIC:4 B2AI_TOPIC:10 B2AI_TOPIC:13 B2AI_TOPIC:18 B2AI_TOPIC:24 B2AI_TOPIC:29				True	True	True		B2AI_USECASE:28	B2AI_USECASE:UseCase	Develop cross-sectional AI models of relationships between diabetes severity, cognitive function, and presence of biomarkers.	As presented by the AI-READI DGP, this use case develops models capable of interpreting relationships between clinical observations of diabetes patients and their features, with a focus on cognitive function. This case does not depend upon availability of pseudotime models, unlike B2AI_USECASE:29.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application	Depends upon availability of pseudotime models.	aireadi	B2AI_TOPIC:4 B2AI_TOPIC:10 B2AI_TOPIC:13 B2AI_TOPIC:18 B2AI_TOPIC:24 B2AI_TOPIC:29				True	True	True		B2AI_USECASE:29	B2AI_USECASE:UseCase	Develop predictive models of insulin dependence and salutogenesis.	As presented by the AI-READI DGP, this use case develops models capable of interpreting relationships between clinical observations of diabetes patients and their features, with a focus on insulin dependence. Its goal is to produce a model capable of yielding predictions about a given patient’s progression towards a health or disease state. This case depends upon availability of pseudotime models, unlike B2AI_USECASE:28.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application		chorus	B2AI_TOPIC:15	B2AI_STANDARD:788			True	True	True		B2AI_USECASE:30	B2AI_USECASE:UseCase	Test and deploy analytical models of clinical image data.	Given the availability of a model of clinical image data (as produced by B2AI_USECASE:25), testing and deploying the model generally involves creating a test data set, preprocessing the data by normalizing and converting it into a format that can be used by the model, evaluating the model's performance on the test set, then optimizing the model by re-training under different parameters or input data until the desired level of performance is achieved. There may be a need for converting the model to a format that can be used in a specific runtime environment or monitoring the model's performance in a production environment and making adjustments as needed. Deploying the analytical model may require consideration of validation, regulatory approval, ethics, and compliance with patient privacy laws.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application		voice	B2AI_TOPIC:31 B2AI_TOPIC:36		B2AI_STANDARD:723 B2AI_STANDARD:790 B2AI_STANDARD:758 B2AI_STANDARD:767 B2AI_STANDARD:785 B2AI_STANDARD:791 B2AI_STANDARD:839		False	True	True		B2AI_USECASE:31	B2AI_USECASE:UseCase	Develop software and cloud infrastructure for automated voice data collection through a smartphone application.	Developing software and cloud infrastructure for automated voice data collection in this use case first requires development of a smartphone application. The application would need to be able to record audio, allow user logins, upload the recorded audio to cloud storage, and permit users to view and manage recorded audio data. The cloud infrastructure would need to store and process the audio data to handle the storage, as well as potentially using machine learning algorithms to analyze the audio data. Additionally, the infrastructure would need to have a secure and reliable means of transmitting data between the smartphone application and the cloud.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application		voice	B2AI_TOPIC:4 B2AI_TOPIC:35 B2AI_TOPIC:36	B2AI_STANDARD:732	B2AI_STANDARD:723 B2AI_STANDARD:790 B2AI_STANDARD:758 B2AI_STANDARD:767 B2AI_STANDARD:785 B2AI_STANDARD:791 B2AI_STANDARD:839		True	True	False		B2AI_USECASE:32	B2AI_USECASE:UseCase	Build a database of human voice samples and associations with biomarkers of health.	Building a database of human voice samples and associations with biomarkers of health may begin with a secure database, either on-premise or based in cloud infrastructure. Data organization and labeling should support retrieval and analysis. Machine learning algorithms can be applied to the data to identify patterns and associations between the voice samples and the biomarkers of health. Regularly updating the database with new data and re-analyzing the data could improve the accuracy and resolution of the predicted associations.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application			B2AI_TOPIC:5				True	True	False		B2AI_USECASE:33	B2AI_USECASE:UseCase	Build a relational database of arbitrary data types.	To set up a relational database, first choose a relational database management system (RDBMS). Create a schema to define the database structure, including the tables and fields. Build tables within the schema and define the fields and data types for each table, then populate them with data by inserting rows. It's also crucial to set up relationships between tables, such as linking a primary key in one table to a foreign key in another table. This ensures data integrity and simplifies querying.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application			B2AI_TOPIC:5		B2AI_STANDARD:802		True	True	False		B2AI_USECASE:34	B2AI_USECASE:UseCase	Query a relational database of arbitrary data types.	Querying a relational database covers a variety of actions to retrieve subsets of its contents.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application			B2AI_TOPIC:5				True	True	False		B2AI_USECASE:35	B2AI_USECASE:UseCase	Build a graph database of arbitrary data types.	To set up a graph database, first choose a graph database management system. Create a graph data model to define the nodes, edges, and properties of the graph. Once the data model is in place, create nodes, edges, and properties in the graph corresponding to input data. Depending on the graph database platform, there may be functionality to set up indexes on certain properties to optimize query performance. Setting up constraints on the data can help to ensure data integrity.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
application			B2AI_TOPIC:5				True	True	False		B2AI_USECASE:36	B2AI_USECASE:UseCase	Query a graph database of arbitrary data types.	Querying a graph database covers a variety of actions to retrieve subsets of its contents, often by yielding subsets of the graph (i.e., subgraphs).	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
modeling			B2AI_TOPIC:5				True	False	True		B2AI_USECASE:37	B2AI_USECASE:UseCase	Train a linear regression model on data in an R tibble.	Training a linear regression model on data in the R tibble data structure generally involves R’s lm() function. To see the summary of the resulting model, use the summary() function on the model object. To make predictions using the model, use the predict() function on the model object and provide new data as the argument.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
modeling			B2AI_TOPIC:5				True	False	False		B2AI_USECASE:38	B2AI_USECASE:UseCase	Train a binary classification model on data in one or more Bioconductor objects.	Training a binary classification model on Bioconductor objects can be a convenient way to work with R statistical functions on large quantities of heterogeneous data. After any necessary preprocessing of the data, such as normalizing or filtering, split it into a training and test set. Then select a classification algorithm and use the training data to train a model. Test data may be used to evaluate the performance of the model and adjust any parameters as necessary.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
modeling			B2AI_TOPIC:5				True	False	False		B2AI_USECASE:39	B2AI_USECASE:UseCase	Train a neural network model on tensor data.	Training a neural network model on tensor data is a frequent use case for developing data analysis and prediction methods.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
integration			B2AI_TOPIC:5	B2AI_STANDARD:109			True	True	False		B2AI_USECASE:40	B2AI_USECASE:UseCase	Transform FHIR data to TSV.	Data described through the HL7 FHIR standard may take a variety of forms, owing to the standard’s intentional flexibility. A highly interpretable, easily parsed format such as TSV may be desirable as part of transformation or subsequent analysis.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
assessment			B2AI_TOPIC:5				True	False	True		B2AI_USECASE:41	B2AI_USECASE:UseCase	Determine whether enough data is available to train a computational model of interest.	Computational models may require a certain amount or complexity of data for their training to be effective. It may be useful to define these properties before beginning modeling.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
assessment			B2AI_TOPIC:5				True	False	True		B2AI_USECASE:42	B2AI_USECASE:UseCase	Assess the quality of a computational model in terms of its ability to complete a specific task.	Independent of their application, computational models may be evaluated using a core set of metrics, including accuracy, precision, and recall.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
assessment			B2AI_TOPIC:5				False	False	True		B2AI_USECASE:43	B2AI_USECASE:UseCase	Assess the potential bias in a computational model.	Even a high-performing computational model may be subject to bias, both explicitly and implicitly. The model may not accurately represent the population it was trained on. It may be algorithmically biased, with some features gaining weight over others in unexpected ways. There may be biases resulting from human preconceptions, e.g., human curators may have already made assumptions about disease status of patients contributing data to the model’s training set. There may also be unexpected confounders, such as social or economic factors contributing to clinical outcomes.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831
assessment			B2AI_TOPIC:5				False	False	True		B2AI_USECASE:44	B2AI_USECASE:UseCase	Assess the explainability of a computational model.	Computational models are not equivalently explainable. A model’s operations may appear correct, but without the ability to justify its responses based on a particular reasoning process, it may remain challenging to identify its weaknesses.	Harry Caufield	caufieldjh	ORCID:0000-0001-5705-7831