OSDR Biological Data API

Overview

The OSDR biological data API is an application programming interface enabling granular access to GeneLab and ALSDA data contained in the NASA Open Science Data Repository (NASA OSDR). It provides two modes of (meta)data interrogation:

a REST interface, which exposes a higher-level, structured view of the objects in the database, with JSON being the primary output format;
a query interface, which allows for complex querying and filtering of both the metadata and the data;
outputs are provided as tables (CSV/TSV and tables converted to JSON).

The REST interface can be seen as a means for database traversal, while the query interface is a means for metadata and data interrogation.

TL;DR

Organization of datasets, assays, samples, and their metadata as presented via the REST interface: screenshots of implied metadata structure

Sample-level metadata addressable in a flatter format as used in the query interfaces:

id.accession
id.sample%20name=Mmus_C57-6J_LVR_GC_I_Rep1_M31
investigation.ontology%20source%20reference
study.characteristics
study.characteristics.strain
study.factor%20value.spaceflight!=basal%20control
file.data%20type=pca
etc.

Data-level columns as addressable in the data query interface:

column.*
column.ENTREZID
column.Mmus_C57-6J_LVR_GC_I_Rep1_M31!=0
etc.

TL;DR: examples

(expand)

Hierarchical metadata

The database consists of datasets uniquely identifiable by their accession numbers, e.g. OSD-48.
Each dataset contains assays and samples whose metadata are organized according to the ISA model.
The API represents these metadata in a hierarchical fashion, extracting the relevant ISA sections for each sample upon request.

For example, if a certain sample has been assayed with microscopy as well as RNA-Seq, the metadata describing such a sample can be found under the respective assays accordingly (note that any sample may be associated with some, but not all, assays in a dataset). E.g. in order to request the ISA (investigation, study, assay) metadata of the OSD-48 sample Mmus_C57-6J_LVR_GC_I_Rep1_M31 that represents it as an analysis object in a microscopy assay, the following hierarchy is applicable:

OSD-48 →
OSD-48_molecular-cellular-imaging_microscopy_pannoramic scan (3d histech) →
Mmus_C57-6J_LVR_GC_I_Rep1_M31

and, respectively, for the same sample, but in an RNA-Seq assay:

OSD-48 →
OSD-48_transcription-profiling_rna-sequencing-(rna-seq)_illumina →
Mmus_C57-6J_LVR_GC_I_Rep1_M31.

Note: the syntax of the two example links is discussed further down in the REST interface section.
The REST interface also serves as a means to identify exact assay and sample names of interest.

On-demand metadata combinations

Given a scope of dataset × assay × sample, the API output is a representation of all relevant ISA metadata for a sample, combined on-demand from ISA entries (investigation, assay, study tabs) that are associated with this sample, rather than simply a reflection of a single record in the database.

This is exemplified by the two links above producing JSONs that share identical sections (investigation, study) and only differ in the assay section: microscopy vs. RNA-Seq.

This ensures that each such representation synthesizes complete investigation, assay, and study metadata for a given sample within the scope of a given dataset and assay.

Indeed, the same section may be repeated in full across outputs for multiple samples; however, this also means that an output for one sample in one assay of one dataset will always contain complete relevant metadata.

The REST interface

The REST interface employs progressive disclosure; by starting from the listing of all datasets ( /v2/datasets/) and exploring REST URLs provided within each of the outputs, one can glean an understanding of the entire scope of the REST interface and avoid the need to peruse the next few subsections of this documentation. If you choose to do so, skip to section REST syntax extensions.

The syntax of REST URLs follows the hierarchy outlined above.
The returned JSON is always comprised of branches starting at the top-level object (dataset).
All REST endpoints accept an optional parameter format (see Output formats). If omitted, the format defaults to plain JSON.

Metadata REST endpoints

(expand)

A listing of all datasets: /v2/datasets/
Includes:
- REST URLs for each individual dataset.
Metadata for a single dataset (via its accession number, e.g. OSD-48): /v2/dataset/OSD-48/
Includes:
- A REST URL for the listing of assays associated with the dataset;
- A REST URL for the listing of files associated with the dataset;
- High-level, aggregate dataset-level metadata.
A listing of assays associated with a dataset (via its accession number, e.g. OSD-48): /v2/dataset/OSD-48/assays/
Includes:
- REST URLs for each individual assay.
Metadata for a single assay (via an accession number, e.g. OSD-48, and an assay name, e.g. OSD-48_transcription-profiling_rna-sequencing-(rna-seq)_illumina): /v2/dataset/OSD-48/assay/OSD-48_transcription-profiling_rna-sequencing-%28rna-seq%29_illumina/
Includes:
- A REST URL for the listing of samples associated with the assay;
- A REST URL for the listing of files associated with the assay;
- Basic assay-level metadata.
A listing of samples associated with an assay (via an accession number, e.g. OSD-48, and an assay name, e.g. OSD-48_transcription-profiling_rna-sequencing-(rna-seq)_illumina): /v2/dataset/OSD-48/assay/OSD-48_transcription-profiling_rna-sequencing-%28rna-seq%29_illumina/samples/
Includes:
- REST URLs for each individual sample.
Metadata for a single sample (via an accession number, e.g. OSD-48, an assay name, e.g. OSD-48_transcription-profiling_rna-sequencing-(rna-seq)_illumina, and a sample name, e.g. Mmus_C57-6J_LVR_GC_I_Rep1_M31): /v2/dataset/OSD-48/assay/OSD-48_transcription-profiling_rna-sequencing-%28rna-seq%29_illumina/sample/Mmus_C57-6J_LVR_GC_I_Rep1_M31/
Includes:
- A REST URL for the listing of files associated with the sample;
- Full sample-level metadata derived from ISA tabs:
  - investigation (describing the investigation corresponding to the dataset),
  - assay (describing the metadata of the sample in the context of the assay),
  - study (describing the study metadata attributed to the sample);
- An auxiliary branch id, which duplicates the dataset accession number, the assay name, and the sample name in order to make them available on the same level of organization as the ISA-derived metadata.

File records REST endpoints

(expand)

REST syntax extensions

(expand)

The query interface

The query interface facilitates the process of filtering by values of multiple metadata fields and/or data columns at once.
Field selectors and filters are accepted as key-value GET parameters (e.g. study.characteristics.strain=S288C) to the respective
endpoints: /v2/query/metadata/ and /v2/query/data/.

All query endpoints accept an optional parameter format (see Output formats). If omitted, the format defaults to CSV.

Sample-level metadata query endpoints

This is positioned as the primary entrypoint into the query interface.
The root endpoint for the sample-level metadata query interface is /v2/query/metadata/.
- A synonym for this endpoint is /v2/query/samples/.
id.accession, id.assay name, id.sample name are always returned as the first three fields, regardless of being requested or not.
- In application to e.g. RNA-Seq analysis, this ensures these three columns can be used as row names that parallel data query column names, if the latter is requested with a format.header.multi modifier.

The query interface operates on the same metadata combinations as the REST interface, providing access to metadata fields and values under branches:
- investigation (part of ISA)
- study (part of ISA)
- assay (part of ISA)
- id (accession number, assay name, sample name, aggregated into a branch)
- file (metadata for files associated with the sample, aggregated into a branch)
The metadata structure is flattened further; for example, a branch:
- study → characteristics → strain → "S288C"
becomes a field:
- study.characteristics.strain
with a value:
- "S288C"

Providing fields as such period-separated keys, optionally paired with values, as components of a GET request (e.g. study.characteristics.strain=S288C), allows for filtering by values of multiple metadata fields at once. The GET request syntax understands the following conventions:

Format example	Meaning
study.characteristics.strain	Include this field in the output, regardless of its value, for all implicated samples.
=study.characteristics.strain	Only include samples that have this field annotated with a non-null (non-NaN) value; also include the field itself in the output.
study.characteristics.strain=S288C	Only include samples whose value of this field is equal to the provided value; also include the field itself in the output.
study.characteristics.strain=S288C\|BY4743	Only include samples whose value of this field is equal to either of the provided values (separated by a vertical pipe, i.e. a logical "OR"); also include the field itself in the output.
study.characteristics.strain!=S288C	Only include samples whose value of this field is not equal to the provided value; still include the field itself in the output. Note that this excludes null (NaN) values, since NaNs are not equal to anything (not even to themselves) by definition.
study.characteristics.strain=/^BY\d+$/ study.characteristics.strain=/^BY\d+$/i study.characteristics.strain=/^BY\d+$/c	Only include samples whose value of this field matches the provided regular expression (in this case: ^BY\d+$, i.e. a leading "BY" followed by a number); also include the field itself in the output. The /i flag invokes case-insensitive matching, while the /c flag enforces case sensitivity. Note: the flags override the behavior.matchcase modifier.
study.characteristics	Include all fields in the given section that are present for any of the samples in the request. This wildcard syntax is applicable to any ISA field starting from the 2nd level (e.g., assay.parameter value, study.factor value, investigation.study assays, etc.)

Example usage

(expand)

Assay-grouped metadata query endpoints

The root endpoint for the assay-grouped metadata query interface is /v2/query/assays/.
These endpoints provide a condensed representation of sample-level metadata, grouped by assay.
id.accession and id.assay name are always returned as the first two fields, regardless of being requested or not.

The same query syntax as for sample-level metadata applies;
However, the id.sample name column is omitted, and the duplicate rows that result from this operation are collapsed.
This, essentially, lists each combination of metadata values that appear per assay once, and can help describe the usability of a given assay for an analysis a user may have in mind.

Example usage

(expand)

Data query endpoints

The root endpoint for the data query interface is /v2/query/data/.
This endpoint provides direct access to data in the OSDR repository:
- If a metadata query resolves to a single file or a subset of mergeable files, the matching data query will resolve to the representation of the underlying data.
- For tabular data, querying and filtering by columns and column values is also implemented;
- Non-tabular data are provided as a direct download.
For tabular data, accession, assay name, and sample name are included as column names.

The same query syntax as for sample-level metadata applies;
One of file.data type or file.file name must be included in order to narrow down the search to specific file data;
If a matching sample-level metadata query resolves to a single file, its data will be returned;
If a matching sample-level metadata query resolves to several files that can be unambiguously merged on the fly (currently, this only applies to file.data type=unnormalized counts), their data will be returned as a merged table.

If the data are in an arbitrary (e.g. binary) format and/or it has been requested with format=raw, the data will be retrieved directly as a file.
If the data type is understood by the API as being tabular, additional GET key-value pairs can be provided, addressing columns as column.COLUMN_NAME and otherwise following the same conventions as those outlined for sample-level metadata. Note: by default, providing any column query component constrains the output to only the requested column(s); to display all columns regardless, the wildcard column.* is to be used.

Example usage

(expand)

Query interface modifiers

(expand)

Output formats

	REST endpoints	Query endpoints	Notes	Examples
default	json	csv	see conventions below	REST \| query
interactive	html	html	see conventions below	REST \| query
raw		raw	only for query data endpoints: retrieves the original data file	query
alternative		tsv	see conventions below	query
		json.split	format: {"columns": […], "data": [[…], …]}	query
		json.records	format: [{"field": value, …}, …]	query
		json.table	format: {"schema": …, "data": [{"field": value, …}, …]}	query
auxiliary	browser	browser	resolves to html if possible to visualize; to raw otherwise

Output format conventions

JSON outputs are produced in accordance with the strict specification:
- null/NaN values are represented as null;
- positive and negative infinities, which are not valid JSON, are converted to strings ("Infinity" and "-Infinity").
In tabular formats (CSV, TSV) the following holds:
- Column names are unquoted strings (e.g. study.characteristics);
- All cell values of type string are quoted (e.g. "S288C");
- All cell values of other types are not quoted;
  - This includes null/NaN values, which are represented as unquoted NaN in order to avoid any possible confusion with a string value "NaN".
In tabular formats, the following holds for column names:
- Metadata column names are represented by period-separated field names as seen in the metadata hierarchy:
  e.g. study → characteristics → strain becomes: study.characteristics.strain;
- Data column names are represented by forward-slash-separated accession, assay name, and sample name values:
  e.g. a sample entry identified by accession number OSD-48, assay name OSD-48_molecular-cellular-imaging_microscopy_pannoramic scan (3d histech), and sample name Mmus_C57-6J_LVR_GC_I_Rep1_M31 becomes:
  OSD-48/OSD-48_molecular-cellular-imaging_microscopy_pannoramic scan (3d histech)/Mmus_C57-6J_LVR_GC_I_Rep1_M31.
- In the interactive (browser) format, the column names are broken up into multiple rows instead, purely to avoid taking up too much horizontal screen space with such names, which can often be rather long.

Overview

TL;DR

TL;DR: examples

Hierarchical metadata

On-demand metadata combinations

The REST interface

Metadata REST endpoints

File records REST endpoints

REST syntax extensions

The query interface

Sample-level metadata query endpoints

Example usage

Assay-grouped metadata query endpoints

Example usage

Data query endpoints

Example usage

Query interface modifiers

Output formats

Output format conventions

Contacts

Site Information

Other Resources

Overview

TL;DR

TL;DR: examples

Hierarchical metadata

On-demand metadata combinations

The REST interface

Metadata REST endpoints

File records REST endpoints

REST syntax extensions

The query interface

Sample-level metadata query endpoints

Example usage

Assay-grouped metadata query endpoints

Example usage

Data query endpoints

Example usage

Query interface modifiers

Output formats

Output format conventions

Follow Us

Newsletter Signup

Contacts

Site Information

Other Resources