edam.data:2082 |
|
ncit:C26358 |
|
|
|
B2AI_SUBSTRATE:1 |
DataSubstrate |
Array |
A data type that represents a collection of elements (values or variables), each selected by one or more indices. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:2 |
DataSubstrate |
Associative Array |
A data structure that stores a collection of key-value pairs, where each key is associated with a value. It allows for fast and efficient lookups by using the keys as indices to access the corresponding values. |
B2AI_SUBSTRATE:1 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
JSON |
|
|
B2AI_SUBSTRATE:3 |
DataSubstrate |
BIDS |
Data conforming to the Brain Imaging Data Structure (BIDS). |
B2AI_SUBSTRATE:19 B2AI_SUBSTRATE:49 |
B2AI_STANDARD:33 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:4 |
DataSubstrate |
BigQuery |
A fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. |
B2AI_SUBSTRATE:5 |
B2AI_STANDARD:735 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:5 |
DataSubstrate |
Column Store |
A database that stores data tables by column rather than by row. |
B2AI_SUBSTRATE:9 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3752 |
|
ncit:C182456 |
|
csv |
Differences in newline characters can cause inconsistency across operating systems. |
B2AI_SUBSTRATE:6 |
DataSubstrate |
Comma-separated values |
Any text or mixed data with distinct records in columns separated by commas and rows separated by newlines. |
B2AI_SUBSTRATE:10 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.data:0006 |
mesh:D064886 |
ncit:C25474 |
|
|
|
B2AI_SUBSTRATE:7 |
DataSubstrate |
Data |
Any collection of discrete values conveying information. |
|
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:8 |
DataSubstrate |
Data Frame |
A data structure that organizes data into a 2-dimensional table of rows and columns. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
mesh:D019991 |
ncit:C15426 |
|
|
|
B2AI_SUBSTRATE:9 |
DataSubstrate |
Database |
An organized collection of structured information, stored electronically and organized for rapid search and retrieval. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3751 |
|
|
|
txt |
|
B2AI_SUBSTRATE:10 |
DataSubstrate |
Delimited Text |
Any data with distinct records separated or delimited by a specific character pattern. |
B2AI_SUBSTRATE:43 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3548 |
|
ncit:C63537 |
File headers |
dicom dcm |
Files are generally named using unique identifiers that may not be compatible across all operating systems (i.e., they may be too long). Patient data is included in each image file header so all files must be processed in order to anonymize them. |
B2AI_SUBSTRATE:11 |
DataSubstrate |
DICOM |
An image and metadata format for radiology imaging. |
B2AI_SUBSTRATE:36 |
B2AI_STANDARD:98 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
ncit:C45803 |
|
|
|
B2AI_SUBSTRATE:12 |
DataSubstrate |
Directed acyclic graph |
A directed graph with no directed cycles. |
B2AI_SUBSTRATE:14 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:13 |
DataSubstrate |
Document Database |
A database that stores and retrieves information in documents. |
B2AI_SUBSTRATE:9 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3617 |
|
ncit:C75914 |
|
|
|
B2AI_SUBSTRATE:14 |
DataSubstrate |
Graph |
A structure of nodes (sometimes called vertices) and edges between them. |
B2AI_SUBSTRATE:7 |
B2AI_STANDARD:768 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:15 |
DataSubstrate |
Graph Database |
A type of database that stores nodes and relationships instead of tables or documents. |
B2AI_SUBSTRATE:9 B2AI_SUBSTRATE:14 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3590 |
|
ncit:C184763 |
|
h5 hdf5 |
Structure is not optimized for data access through cloud storage infrastructure. |
B2AI_SUBSTRATE:16 |
DataSubstrate |
HDF5 |
A data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. |
B2AI_SUBSTRATE:18 |
B2AI_STANDARD:339 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:17 |
DataSubstrate |
Heap |
A complete binary tree, i.e., each node has no more than two children. |
Tree |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:18 |
DataSubstrate |
Hierarchical Array |
A data structure of a list, such that list elements may be subsets of other elements. |
B2AI_SUBSTRATE:1 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.data:2968 |
|
ncit:C48179 |
|
|
|
B2AI_SUBSTRATE:19 |
DataSubstrate |
Image |
Any visual representation of something. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3464 |
|
ncit:C184769 |
|
json |
|
B2AI_SUBSTRATE:20 |
DataSubstrate |
JSON |
JavaScript Object Notation (JSON) is a lightweight format for storing and transporting data. |
B2AI_SUBSTRATE:2 B2AI_SUBSTRATE:18 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
tsv |
|
B2AI_SUBSTRATE:21 |
DataSubstrate |
KGX TSV |
A tab-delimited data format for exchanging property graph data. |
B2AI_SUBSTRATE:32 B2AI_SUBSTRATE:41 |
B2AI_STANDARD:346 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
mongo |
The maximum size of an individual document in MongoDB is 16MB with a nested depth of 100 levels. |
B2AI_SUBSTRATE:22 |
DataSubstrate |
MongoDB |
A non-relational document database that provides support for JSON-like storage. |
B2AI_SUBSTRATE:13 |
B2AI_STANDARD:797 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
mysql sql |
|
B2AI_SUBSTRATE:23 |
DataSubstrate |
MySQL |
A relational database management system developed by Oracle that is based on structured query language (SQL). |
B2AI_SUBSTRATE:37 |
B2AI_STANDARD:801 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:24 |
DataSubstrate |
N-Dimensional Array |
A data structure that can store a collection of items, where each item is identified by a set of indices. The number of indices required to identify an item is referred to as the dimension of the array, hence the name N-dimensional array. |
B2AI_SUBSTRATE:1 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
All data is stored locally - this can cause slowdowns when data exceeds available memory. |
B2AI_SUBSTRATE:25 |
DataSubstrate |
Neo4j |
A popular graph database platform. |
B2AI_SUBSTRATE:15 |
B2AI_STANDARD:802 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
mesh:D016571 |
ncit:C17429 |
|
|
|
B2AI_SUBSTRATE:26 |
DataSubstrate |
Neural Network Model |
The result of training a neural network on a certain set of inputs. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
nnef |
|
B2AI_SUBSTRATE:27 |
DataSubstrate |
NNEF |
An exchange format for neural network models produced using Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, or MXNet. |
B2AI_SUBSTRATE:26 |
B2AI_STANDARD:354 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
onnx |
|
B2AI_SUBSTRATE:28 |
DataSubstrate |
ONNX |
An open format built to represent machine learning models. |
B2AI_SUBSTRATE:26 |
B2AI_STANDARD:357 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:29 |
DataSubstrate |
Pandas DataFrame |
A two-dimensional, size-mutable, potentially heterogeneous tabular data object. |
B2AI_SUBSTRATE:8 |
B2AI_STANDARD:813 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
parquet pqt |
|
B2AI_SUBSTRATE:30 |
DataSubstrate |
Parquet |
Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. |
B2AI_SUBSTRATE:5 |
B2AI_STANDARD:359 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
sql |
|
B2AI_SUBSTRATE:31 |
DataSubstrate |
PostgreSQL |
An open-source relational database management system emphasizing extensibility and SQL compliance. |
B2AI_SUBSTRATE:37 |
B2AI_STANDARD:815 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:32 |
DataSubstrate |
Property graph |
A graph model in which nodes and edges may be assigned properties (i.e., values or key-value pairs). |
B2AI_SUBSTRATE:14 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:33 |
DataSubstrate |
PyTorch Tensor |
In PyTorch, a torch.Tensor is a multi-dimensional matrix containing elements of a single data type. |
B2AI_SUBSTRATE:42 |
B2AI_STANDARD:816 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
Memory-limited. |
B2AI_SUBSTRATE:34 |
DataSubstrate |
R data.frame |
A tightly coupled collection of variables that shares many of the properties of matrices and of lists. |
B2AI_SUBSTRATE:8 |
B2AI_STANDARD:833 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:35 |
DataSubstrate |
R tibble |
A redesigned version of an R data frame. Never changes the input type, can have columns that are lists, can have non-standard variable names, can start with a number or contain spaces, only recycles vectors of length 1, and never creates row names. |
B2AI_SUBSTRATE:8 |
B2AI_STANDARD:833 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:36 |
DataSubstrate |
Raster Image |
Any visual representation of something represented as a two-dimensional matrix of pixel values denoting intensity, potentially accompanied by other values for colors or other image properties (e.g., compression). |
B2AI_SUBSTRATE:19 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:37 |
DataSubstrate |
Relational Database |
A database that stores and provides access to data points related to one another. |
B2AI_SUBSTRATE:9 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:38 |
DataSubstrate |
Set |
A sorted data structure of unique elements of the same type. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
ncit:C45253 |
|
|
|
B2AI_SUBSTRATE:39 |
DataSubstrate |
String |
An array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:40 |
DataSubstrate |
SummarizedExperiment |
The SummarizedExperiment Bioconductor container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples. |
B2AI_SUBSTRATE:18 |
B2AI_STANDARD:705 B2AI_STANDARD:833 B2AI_STANDARD:286 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3475 |
|
ncit:C164049 |
|
tsv |
Differences in newline characters can cause inconsistency across operating systems. |
B2AI_SUBSTRATE:41 |
DataSubstrate |
Tab-separated values |
Any text or mixed data with distinct records in columns separated by tab characters and rows separated by newlines. |
B2AI_SUBSTRATE:10 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:42 |
DataSubstrate |
Tensor |
An algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.data:2526 |
|
ncit:C25704 |
|
txt |
|
B2AI_SUBSTRATE:43 |
DataSubstrate |
Text |
Any form of written information that is composed of letters, words, and sentences. This may include anything from written documents, articles, or books, to emails, social media posts, and transcribed speech. It may also include unstructured, human-readable fields of documents containing other data. |
B2AI_SUBSTRATE:39 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
ncit:C45418 |
|
|
|
B2AI_SUBSTRATE:44 |
DataSubstrate |
Tree |
An undirected graph with each pair of vertices connected by no more than one path. Also known as a connected acyclic undirected graph. |
B2AI_SUBSTRATE:14 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:45 |
DataSubstrate |
Trie |
A sorted, associative tree. Also known as a radix tree or prefix tree. |
B2AI_SUBSTRATE:44 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
ncit:C54169 |
|
|
|
B2AI_SUBSTRATE:46 |
DataSubstrate |
Vector |
A mathematical object that has magnitude and direction. A vector is often represented as a one-dimensional array or list with numerical elements. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:47 |
DataSubstrate |
Vector Image |
Any visual representation of something represented as a set of geometric shapes defined on a Cartesian plane. |
B2AI_SUBSTRATE:19 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
File headers |
wav |
|
B2AI_SUBSTRATE:48 |
DataSubstrate |
Waveform Audio File Format |
An audio file format standard. Generally supported by digital audio software. |
B2AI_SUBSTRATE:49 |
B2AI_STANDARD:387 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:49 |
DataSubstrate |
Waveform Data |
The two-dimensional representation of a signal as a function of time. |
B2AI_SUBSTRATE:7 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:50 |
DataSubstrate |
xarray |
A format for defining arrays with labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays, which allows for more intuitive, more concise, and less error-prone user experience. |
B2AI_SUBSTRATE:24 |
B2AI_STANDARD:392 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3915 |
|
|
|
zarr |
|
B2AI_SUBSTRATE:51 |
DataSubstrate |
Zarr |
A format for storage of large N-dimensional typed arrays. Has implementations in multiple programming languages. |
B2AI_SUBSTRATE:24 |
B2AI_STANDARD:394 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
ncit:C190416 |
|
tar zip |
Must be decompressed before reading. Compression may be lossy, i.e., it discards information in the process of encoding. |
B2AI_SUBSTRATE:52 |
DataSubstrate |
Compressed Data |
Data in which information is represented with fewer bits than the original, uncompressed representation. |
B2AI_SUBSTRATE:7 |
B2AI_STANDARD:384 B2AI_STANDARD:395 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
edam.format:3003 |
|
ncit:C153367 |
File headers |
txt bed |
|
B2AI_SUBSTRATE:53 |
DataSubstrate |
BED |
BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in a genome annotation track. |
B2AI_SUBSTRATE:10 |
B2AI_STANDARD:36 |
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
|
|
|
|
|
|
|
B2AI_SUBSTRATE:54 |
DataSubstrate |
Vector Database |
A database that stores and retrieves information represented as high-dimensional vectors. The original data may be very unstructured. |
B2AI_SUBSTRATE:9 |
|
Harry Caufield |
caufieldjh |
ORCID:0000-0001-5705-7831 |
2023-05-23 |
|
|
|
|
|
|
B2AI_SUBSTRATE:55 |
DataSubstrate |
Pinecone |
A vector database. Includes a single-stage filtering function allowing complex searches in single queries. |
B2AI_SUBSTRATE:54 |
|
Harry Caufield |
caufieldjh |
0000-0001-5705-7831 |
2023-05-23 |