Skip to content

Substrates

data_substrates_collection

edam_id mesh_id ncit_id metadata_storage file_extensions limitations id category name description subclass_of related_to contributor_name contributor_github_name contributor_orcid contribution_date
edam.data:2082 ncit:C26358 B2AI_SUBSTRATE:1 DataSubstrate Array A data type that represents a collection of elements (values or variables), each selected by one or more indices. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:2 DataSubstrate Associative Array A data structure that stores a collection of key-value pairs, where each key is associated with a value. It allows for fast and efficient lookups by using the keys as indices to access the corresponding values. B2AI_SUBSTRATE:1 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
JSON B2AI_SUBSTRATE:3 DataSubstrate BIDS Data conforming to the Brain Imaging Data Structure (BIDS). B2AI_SUBSTRATE:19 B2AI_SUBSTRATE:49 B2AI_STANDARD:33 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:4 DataSubstrate BigQuery A fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. B2AI_SUBSTRATE:5 B2AI_STANDARD:735 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:5 DataSubstrate Column Store A database that stores data tables by column rather than by row. B2AI_SUBSTRATE:9 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3752 ncit:C182456 csv Differences in newline characters can cause inconsistency across operating systems. B2AI_SUBSTRATE:6 DataSubstrate Comma-separated values Any text or mixed data with distinct records in columns separated by commas and rows separated by newlines. B2AI_SUBSTRATE:10 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.data:0006 mesh:D064886 ncit:C25474 B2AI_SUBSTRATE:7 DataSubstrate Data Any collection of discrete values conveying information. Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:8 DataSubstrate Data Frame A data structure that organizes data into a 2-dimensional table of rows and columns. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
mesh:D019991 ncit:C15426 B2AI_SUBSTRATE:9 DataSubstrate Database An organized collection of structured information, stored electronically and organized for rapid search and retrieval. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3751 txt B2AI_SUBSTRATE:10 DataSubstrate Delimited Text Any data with distinct records separated or delimited by a specific character pattern. B2AI_SUBSTRATE:43 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3548 ncit:C63537 File headers dicom dcm Files are generally named using unique identifiers that may not be compatible across all operating systems (i.e., they may be too long). Patient data is included in each image file header so all files must be processed in order to anonymize them. B2AI_SUBSTRATE:11 DataSubstrate DICOM An image and metadata format for radiology imaging. B2AI_SUBSTRATE:36 B2AI_STANDARD:98 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
ncit:C45803 B2AI_SUBSTRATE:12 DataSubstrate Directed acyclic graph A directed graph with no directed cycles. B2AI_SUBSTRATE:14 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:13 DataSubstrate Document Database A database that stores and retrieves information in documents. B2AI_SUBSTRATE:9 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3617 ncit:C75914 B2AI_SUBSTRATE:14 DataSubstrate Graph A structure of nodes (sometimes called vertices) and edges between them. B2AI_SUBSTRATE:7 B2AI_STANDARD:768 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:15 DataSubstrate Graph Database A type of database that stores nodes and relationships instead of tables or documents. B2AI_SUBSTRATE:9 B2AI_SUBSTRATE:14 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3590 ncit:C184763 h5 hdf5 Structure is not optimized for data access through cloud storage infrastructure. B2AI_SUBSTRATE:16 DataSubstrate HDF5 A data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. B2AI_SUBSTRATE:18 B2AI_STANDARD:339 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:17 DataSubstrate Heap A complete binary tree, i.e., each node has no more than two children. Tree Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:18 DataSubstrate Hierarchical Array A data structure of a list, such that list elements may be subsets of other elements. B2AI_SUBSTRATE:1 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.data:2968 ncit:C48179 B2AI_SUBSTRATE:19 DataSubstrate Image Any visual representation of something. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3464 ncit:C184769 json B2AI_SUBSTRATE:20 DataSubstrate JSON JavaScript Object Notation (JSON) is a lightweight format for storing and transporting data. B2AI_SUBSTRATE:2 B2AI_SUBSTRATE:18 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
tsv B2AI_SUBSTRATE:21 DataSubstrate KGX TSV A tab-delimited data format for exchanging property graph data. B2AI_SUBSTRATE:32 B2AI_SUBSTRATE:41 B2AI_STANDARD:346 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
mongo The maximum size of an individual document in MongoDB is 16MB with a nested depth of 100 levels. B2AI_SUBSTRATE:22 DataSubstrate MongoDB A non-relational document database that provides support for JSON-like storage. B2AI_SUBSTRATE:13 B2AI_STANDARD:797 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
mysql sql B2AI_SUBSTRATE:23 DataSubstrate MySQL A relational database management system developed by Oracle that is based on structured query language (SQL). B2AI_SUBSTRATE:37 B2AI_STANDARD:801 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:24 DataSubstrate N-Dimensional Array A data structure that can store a collection of items, where each item is identified by a set of indices. The number of indices required to identify an item is referred to as the dimension of the array, hence the name N-dimensional array. B2AI_SUBSTRATE:1 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
All data is stored locally - this can cause slowdowns when data exceeds available memory. B2AI_SUBSTRATE:25 DataSubstrate Neo4j A popular graph database platform. B2AI_SUBSTRATE:15 B2AI_STANDARD:802 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
mesh:D016571 ncit:C17429 B2AI_SUBSTRATE:26 DataSubstrate Neural Network Model The result of training a neural network on a certain set of inputs. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
nnef B2AI_SUBSTRATE:27 DataSubstrate NNEF An exchange format for neural network models produced using Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, or MXNet. B2AI_SUBSTRATE:26 B2AI_STANDARD:354 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
onnx B2AI_SUBSTRATE:28 DataSubstrate ONNX An open format built to represent machine learning models. B2AI_SUBSTRATE:26 B2AI_STANDARD:357 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:29 DataSubstrate Pandas DataFrame A two-dimensional, size-mutable, potentially heterogeneous tabular data object. B2AI_SUBSTRATE:8 B2AI_STANDARD:813 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
parquet pqt B2AI_SUBSTRATE:30 DataSubstrate Parquet Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. B2AI_SUBSTRATE:5 B2AI_STANDARD:359 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
sql B2AI_SUBSTRATE:31 DataSubstrate PostgreSQL An open-source relational database management system emphasizing extensibility and SQL compliance. B2AI_SUBSTRATE:37 B2AI_STANDARD:815 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:32 DataSubstrate Property graph A graph model in which nodes and edges may be assigned properties (i.e., values or key-value pairs). B2AI_SUBSTRATE:14 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:33 DataSubstrate PyTorch Tensor In PyTorch, a torch.Tensor is a multi-dimensional matrix containing elements of a single data type. B2AI_SUBSTRATE:42 B2AI_STANDARD:816 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
Memory-limited. B2AI_SUBSTRATE:34 DataSubstrate R data.frame A tightly coupled collection of variables that shares many of the properties of matrices and of lists. B2AI_SUBSTRATE:8 B2AI_STANDARD:833 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:35 DataSubstrate R tibble A redesigned version of an R data frame. Never changes the input type, can have columns that are lists, can have non-standard variable names, can start with a number or contain spaces, only recycles vectors of length 1, and never creates row names. B2AI_SUBSTRATE:8 B2AI_STANDARD:833 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:36 DataSubstrate Raster Image Any visual representation of something represented as a two-dimensional matrix of pixel values denoting intensity, potentially accompanied by other values for colors or other image properties (e.g., compression). B2AI_SUBSTRATE:19 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:37 DataSubstrate Relational Database A database that stores and provides access to data points related to one another. B2AI_SUBSTRATE:9 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:38 DataSubstrate Set A sorted data structure of unique elements of the same type. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
ncit:C45253 B2AI_SUBSTRATE:39 DataSubstrate String An array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:40 DataSubstrate SummarizedExperiment The SummarizedExperiment Bioconductor container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples. B2AI_SUBSTRATE:18 B2AI_STANDARD:705 B2AI_STANDARD:833 B2AI_STANDARD:286 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3475 ncit:C164049 tsv Differences in newline characters can cause inconsistency across operating systems. B2AI_SUBSTRATE:41 DataSubstrate Tab-separated values Any text or mixed data with distinct records in columns separated by tab characters and rows separated by newlines. B2AI_SUBSTRATE:10 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:42 DataSubstrate Tensor An algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.data:2526 ncit:C25704 txt B2AI_SUBSTRATE:43 DataSubstrate Text Any form of written information that is composed of letters, words, and sentences. This may include anything from written documents, articles, or books, to emails, social media posts, and transcribed speech. It may also include unstructured, human-readable fields of documents containing other data. B2AI_SUBSTRATE:39 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
ncit:C45418 B2AI_SUBSTRATE:44 DataSubstrate Tree An undirected graph with each pair of vertices connected by no more than one path. Also known as a connected acyclic undirected graph. B2AI_SUBSTRATE:14 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:45 DataSubstrate Trie A sorted, associative tree. Also known as a radix tree or prefix tree. B2AI_SUBSTRATE:44 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
ncit:C54169 B2AI_SUBSTRATE:46 DataSubstrate Vector A mathematical object that has magnitude and direction. A vector is often represented as a one-dimensional array or list with numerical elements. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:47 DataSubstrate Vector Image Any visual representation of something represented as a set of geometric shapes defined on a Cartesian plane. B2AI_SUBSTRATE:19 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
File headers wav B2AI_SUBSTRATE:48 DataSubstrate Waveform Audio File Format An audio file format standard. Generally supported by digital audio software. B2AI_SUBSTRATE:49 B2AI_STANDARD:387 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:49 DataSubstrate Waveform Data The two-dimensional representation of a signal as a function of time. B2AI_SUBSTRATE:7 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:50 DataSubstrate xarray A format for defining arrays with labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays, which allows for more intuitive, more concise, and less error-prone user experience. B2AI_SUBSTRATE:24 B2AI_STANDARD:392 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3915 zarr B2AI_SUBSTRATE:51 DataSubstrate Zarr A format for storage of large N-dimensional typed arrays. Has implementations in multiple programming languages. B2AI_SUBSTRATE:24 B2AI_STANDARD:394 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
ncit:C190416 tar zip Must be decompressed before reading. Compression may be lossy, i.e., it discards information in the process of encoding. B2AI_SUBSTRATE:52 DataSubstrate Compressed Data Data in which information is represented with fewer bits than the original, uncompressed representation. B2AI_SUBSTRATE:7 B2AI_STANDARD:384 B2AI_STANDARD:395 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
edam.format:3003 ncit:C153367 File headers txt bed B2AI_SUBSTRATE:53 DataSubstrate BED BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in a genome annotation track. B2AI_SUBSTRATE:10 B2AI_STANDARD:36 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831
B2AI_SUBSTRATE:54 DataSubstrate Vector Database A database that stores and retrieves information represented as high-dimensional vectors. The original data may be very unstructured. B2AI_SUBSTRATE:9 Harry Caufield caufieldjh ORCID:0000-0001-5705-7831 2023-05-23
B2AI_SUBSTRATE:55 DataSubstrate Pinecone A vector database. Includes a single-stage filtering function allowing complex searches in single queries. B2AI_SUBSTRATE:54 Harry Caufield caufieldjh 0000-0001-5705-7831 2023-05-23