The DataStore is created with a size (number of rows) and a config
const ds = new DataStore(1000, {....})
In theory The config is optional, columns and their data can be added later, but practically a config is required and can contain the following
The name of the DataStore - should be human readable
An array of column objects
The DataStore can contain a number of columns. Each column can have the following
name The column's label that is exposed to the user
field The column's id - should be unique
datatype
text
- string with a maximum of 256 different valuestext16
- string with a maximum of 65536 different valuesinteger
- an integer (represented by float32)double
- a floating point number (represented by float32)unique
- any textmultitext
- A field that can have more than one value eg 'red'
, 'red, blue'
int32
- for larger integers e.g. genomic co-ordinates (represented by an int32
)values - For text and multitext columns, this will be an array of the possible values, the raw data will contain the index(es) of the value(s) in the array. Text columns can't have more than 256 values and multitext no more than 65536
values - For text and multitext columns, this will be an array of the possible values, the raw data will contain the index(es) of the value(s) in the array. Text columns can't have more than 256 values and multitext no more than 65536
colors - An array of colors (in hex format). Only used for text/multitext and integer/double columns. In the former the colors array is mapped to the values array. In the latter, the colors are interpolated between max and min values (or quantiles if specified). Default colors will be used if not supplied
stringLength - for unique columns, this is the length of the longest value and for multitext the maximum number of values associated with a data item.
colorLogScale - true of false - for number columns, whether the color should be a log scale
editable - specifies a column can be edited.
is_url - the column contains url links (unique and text columns)
is_url - the column contains url links (unique and text columns)
minMax - for integer and double columns, the minimum/maximum values in an array of length 2
quantiles - an object with 0.05, 0.01 and 0.001 as keys, with each entry and array of the x and 1-x quantiles
{
"quantiles": {
"0.05": [x, y],
"0.01": [x, y],
"0.001": [x, y]
}
}
If the column's data is supplied as an array, then values and stringLength need not be supplied and can be calculated. However if the column's data is supplied as a SharedArray Buffer these values are required.
minMax and quantiles will be calculated if not supplied but this may be slow for large datasets and hence should be given in the config
A list of logically grouped columns, in the following format. Columns should be referenced by their field values. A column can belong to more than one group
"columnGroups": [
{
"columns": [
"area",
"eccentricity",
"perimeter"
],
"name": "cell Stats"
},
{
"columns": [
"CD34",
"CD38",
"GranzymeB"
],
"name": "markers"
}
]
A dictionary where the keys are the dataStores that this dataStore should link with and the entries are objects describing the links. See the section for linking to other DataStores
"links": {
"ds1": {.....},
"ds2": {.....}
}
Represent thumbnails for each data item. It should be an object containing 'image sets', with the key being the name of the set. Each set should have:
base_url
- the base url of the image - the images key and type will be appended to this urlkey_column
- the column in the dataStore that contains the keys for each imagetype
- the type of image {
"images": {
"set1": {
"base_url": "./my_thumbnails/set1/"
"key_column": "set1_image_key",
"type": "png"
}
}
}
Charts that use the images in set1 would then associate the row whose set1_image_key
was 26373
with the following url:
`/my_thumbnails/set1/26373.png`
These have exactly the same format as above, but represent much larger images, which are intended only to be displayed one or two at a time.
The data contains a number of separate regions, with each data point a location in this region. The name of the field which specifies the region and the fields which specify the x, y co-ordinates,
all_regions is a dictionary with the name of the region (the value in the region field) as the key, it contains the following:
roi
the limits of the regionsimages
a dictionary of images describing the file, its position and height, width and name. The image can be enlarged or offset from the pointsdefault_image
- the name of the image that will be loaded - if null or missing, no background image will be loadedscale - in mm
*ome_tiff
- the name of the ometiff image associated with the region. It is assumed to be in the location specified by base_url
base_url
- url where the images reside the image will be
"regions": {
"position_fields": ["x", "y"],
"region_field": "sample_id",
"default_color": "annotation",
"scale_unit": "mm",
"scale": 0.001,
"base_url": "./images/",
"all_regions": {
"IPF_SAMPLE_104F_ROI_1": {
"roi": {
"min_x": 0,
"max_x": 1000,
"min_y": 0,
"max_y": 1000
},
"default_image": "cellmask",
"images": {
"cellmask": {
"file": "jssRG6.png",
"position": [0, 0],
"height": 1000,
"width": 1000,
"name": "cellmask"
}
},
"ome_tiff": "IPF_SAMPLE_104F_ROI_1.ome.tiff"
}
}
}
This is for a dataset that contains information about the interactions of set of objects (cells) for a number of regions/, a pivot column
pivot_column
this is the column each pairwise interactions can be in a region or a group of regionsinteraction_columns
The two columns that are the objects (cell types) that interactis_single_region
the interactions are for single region as opposed to a condition/state composed of many regionsAlso the default values for a few charts also need to be described
"interactions": {
"pivot_column": "condition",
"interaction_columns": ["Cell Type 1", "Cell Type 2"],
"is_single_region": false
"spatial_connectivity_map": {
"link_length": "gr20",
"link_thickness": "gr20",
"link_color": "%contacts",
"node_size": "mean cell 1 number"
},
//optional
"interaction_matrix": {
"groups": ["Cell Type 1 group", "Cell Type 2 group"]
},
"cell_radial_chart": {
"link_thickness": "gr20",
}
},
Sometimes, the x and y co-ordinates of certain groups within the data need to be changed to align with each other. This is enabled by adding an offsets parameter to the config which is comprised of:
param
- the columns that can be offset (e.g. "x" and "y" )groups
- which column specifies the groups that can be offsetbackground_filter
- if the data consists of multiple spatial data, a background_filter referring to the column which represents each spatial entity e.g."ROI"values
- a nested dictionary for each filter and group describing its offset and rotation. This is optional, and can be omitted if there are currently no offsets or rotations. Each value should contain the following:
rotation
- rotation +/- in degreesoffset
- array with x and y offsetsrotation_center
- the relative rotation point - the actual point is +/- the offsets. if omitted, a default value based on the center if of all the points is usedAn example for x and y columns, where each panel can be offset in each ROI and the offset/rotation for panel1 in ROI_1 is specified is shown below:
"offsets": {
"param": ["x", "y"]
"groups": "panels",
"background_filter": "ROI",
"values": {
"ROI_1": {
"panel1": {
"offset": [10, 20],
"rotation": 12
}
}
}
}
If no background filter is specified then a single entry "all" should be used in values:
"offsets": {
"param": ["x", "y"]
"groups": "panels",
"values": {
"all": {
"panel1": {
"offset": [10, 20],
"rotation": 12
}
}
}
}
To set the values use the setColumnOffset
method
//rotate by 45 deg
let data = {group: "panel1", rotation: 45, filter: "ROI_1"};
ds.setColumnOffset(data);
//translate by 30, 30
data = {group: "panel1", offsets: [30, 30], filter: "ROI_1"};
//passing true as a second parameter will update all listeners
ds.setColumnOffset(data, true);
If no background filter has been specified, you should omit the filter value or set it to "all"
To reset a group's offsets to the original values use resetColumnOffsets
ds.resetColumnOffsets("ROI_1", "panel1", true)
The first parameter should be null or "all" if there is no background filter
Allows a genome browser, to be added as a chart. The default feature track will show each item in the DataStore.Has the following parameters:
default_parameters (optional) an object containing parameters for the Genome Browser
default_track The track associated with the data. Needs to be gzipped bed file, indexed using tabix and have 4 columns: chromosome, start, end and the id of the row in the DataStore
default_track_parameters (optional) - An object containing parameters for the feature track
location_fields An array containing the id of the corresponding columns in the DataStore for chr start and end
default_tracks (optional) A list of track configs for any tracks which will be displayed when the genome browser is added (A ruler track is always added and need not be specified here)
atac_bam_track (optional) This track will cluster reads based on the barcode tag in the bam file. The DataSource needs to be linked to another DataStore that has these barcodes in order for the atac_bam_track to display e.g.
"links": {
"cells": {
"index": "cell_id",
"access_data": true
}
}
Example:
Internally every column is a typed array, which makes manipulation much quicker and allows webworkers to work on them without holding up the main thread
Column data can be added by the following
Adding a data parameter to the column objects in the columns parameter of the config passed to the DataStores's constructor
Using the DataStore's addColumn
method passing the data parameter as the second parameter
ds.addColumn({datatype: "text", "field": "myvals", name: "My Vals"}, data)
If a column has already been added, use setColumn
method with the column's field and data
ds.setColumnData("myvals", data)
In each case, data can either be an array of values or a SharedArrayBuffer containing the raw format of the column. Using an array, although more convenient, will be slow, as it has to be converted to the native format
The column should not contain more than 256 unique values.
["blue", "green", "green", "yellow", "blue", "green"]
//would be converted to
values: ["green", "blue", yellow]
data: [1, 0, 0, 2, 1, 0] //(Uint8Array)
The same as as text except can have have up to 65536 unique values, but cannot be used in as many charts
A column that can hold multiple (or no values)
[ "A, B, C", "B, A", "A, B", "D, E", "E, C, D" ]
//would be converted to
values: ["A", "B", "C", "D", "E"]
stringLength: 3
data: [0, 1, 2, 1, 0, 65535, 0, 1, 65535, 3, 4, 65535, 0, 2, 3] //(Uint16Array)
A unique column represents text that can contain more than 256 values
["ZX1212", "X21", "F232", "D1"]
//would be converted to
data: [90, 88, 49, 50, 49, 50, 88, 50, 49, 0, 0, 0, 70, 50, 51, 50, 0, 0, 68, 49, 0, 0, 0, 0] //(Uint8Array)
stringLength: 6
These are treated the same and are represented by Float32Array
[1.2, 3, 4.5.7]
//would be converted to
data: [1.2, 3, 4.5.7] //(Float32Array)
This is represented by a Int32Array and is better far larger integers. e.g. genomic locations. However, it can't hold an NaN which represents missing data
[1.2, 3, 4.5.7]
//would be converted to
data: [1.2, 3, 4.5.7] //(Float32Array)
The links parameter of a DataStore's config allows different types of communication between other datastores.
For the situation, where one datasource e.g cells has a column specifying cell types (e.g. annotations) and a second datasource has interactions between these cell types e.g. has columns for cell type 1, cell type 2 and sphere of interaction (e.g. ROI,sample,disease_state) plus various columns with stats about the interactions ( e.g. gr(20), %contacts etc.)
The link needs to contain the following information
example
{
"name":"spatial stats",
"links": {
"cells": {
"interactions": {
"interaction_columns": ["Cell Type 1",
"Cell Type 2",
"annotations"],
"pivot_column": ["sample_id"],
"is_single_region": true
}
}
}
}
The above will have the following effects:-
This specifies that the DataStore can contain data linked to the rows another DataStore. For example, in single cell data, the cell DataStore would be linked to the gene DataStore. Data such as gene expression per cell could then be added to the cell DataStore as columns (on demand - not all at once). It should have the following parameters:-
"subgroups": {
"gs": {
"name": "gene_scores",
"type": "sparse",
"label": "Gene Score"
},
"igs": {
"name": "imp_gene_scores",
"label": "Imputed Gene Score"
}
}
The number of columns could potentially be very large, depending on the number of rows in the linked dataset, thus unlike other columns these are not explicitly specified in the DataStore's config. The column's field will have the following format:
<stub>|<name>|<index>
stub is the identifier of the subgroup
name is the name shown to the user
index the index of the associated row in the linked dataset and should also be used retrieve the data for this column. For example, a config for a histogram of imputed gene score for gene EMB
{
"type": "bar_chart",
"param": "igs|EMB(igs)|342"
}
The columns's data will be loaded in the normal way by the specified dataloader, but the column object passed to the dataloader will also have the following fields:
example
[
{
"name": "cells",
"columns": [],
"links": {
"genes": {
"rows_as_columns": {
"name_column": "gene_name",
"name": "Gene Scores",
"subgroups": {
"gs": {
"name": "gene_scores",
"type": "sparse",
"label": "Gene Score"
},
"igs": {
"name": "imp_gene_scores",
"label": "Imputed Gene Score"
}
}
}
}
}
},
{
"name": "genes",
"columns": [
{
"name": "Gene name",
"field": "name",
"datatype": "unique"
}
]
}
]
You can link columns from two DataStores in order to minimize the duplication of data. For example if a cell dataset contained cells from different samples and a sample dataset contained information about these samples, the cell DataStore should have access to this metadata, without the duplication of data. To achieve this, include a link to the other DatStore, specifying which columns are needed from that DataStore and the index, which is the column that the two DataStores share and must be of datatype text. You must also describe the linked columns in the 'columns' parameter with the fields matching the linked DataStore. The colors/values can be omitted as these will be taken from the linked DataStore. However column names(what the user sees) can differ
example
{
"name": "cells",
"size": 101737,
"columns": [
{
"name": "sample_id",
"datatype": "text"
},
//the following columns will get their data from the samples DataStore
{
"name": "sample condition",
"datatype": "text",
"field": "condition"
},
{
"name": "sample sex",
"datatype": "text",
"field": "sex"
}
],
"links": {
"samples": {
"columns": ["condition", "sex"],
"index": "sample_id"
}
}
},
{
"name": "samples",
"size": 11,
"columns": [
{
"name": "sample_id",
"datatype": "text",
"values": ["1", "2", "3", "4"]
},
{
"name": "condition",
"datatype": "text",
"values": ["sick", "healthy"],
"colors": ["red", "green"]
},
{
"name": "sex",
"datatype": "text",
"values": ["male", "female"],
"colors": ["blue", "pink"]
}
]
}
For a datastore to access data from another datastore, specify "access_data"
in the link.
An optional index
parameter can be used to specify a column in the linked datastore.
{
"name": "genes",
"links": {
"cells": {
"index": "cell_id",
"access_data": true
}
}
}
With the above example, when a chart in the gene's datasource is created, the setupLinks
method (if it exists for the given type of chart - as of this writing, GenomeBrowser
is the only chart implementing this) will be called. Arguments will be passed with the cell datastore, the name of the index and a getDataFunction
. This function will load (if necessary) any column data from the cell's datastore. The 'index' column is implicitly loaded and need not be specified in this function
setupLinks(dataStore, index, getDataFunction) {
//keep for future use
this.dataLink = { dataStore, index, getDataFunction };
//make sure cell_size data has been loaded
getDataFunction(["cell_size"],()=>{
//if you need an index
const cell_index = dataStore.getColumnIndex(index);
const cell_size = dataStore.getRawColumn("cell_size");
//do stuff
//add listeners to the datastore
})
}
To link column colors, specify the column from the linked DataStore you wish the colors to match (link_to
) and the column you want to match (col
). You can link one column to many
{
"name": "cell interactions",
"links": {
"cells": {
"sync_column_colors": [
{
"link_to": "annotations",
"col": "Cell Type 1"
},
{
"link_to": "annotations",
"col": "Cell Type 2"
}
]
}
}
}
In the above a value in the Cell Type 1
or Cell Type 2
columns in the cell interactions
datasource will be the same color as it is in the cells
"annotation"
column
This allows a column to be filtered in a target datasource based on the set of unique values from the current selection in the source datasource.
Users can use this to explore how lower-level cells
data contributes to the statistics
by making selections on statistics
charts.
Consider a scenario where the cells
data source is clustered into regions (identified by a cell_regionID
column), and there's a statistics
data source representing region-based statistics. The link's source ("source_column"
) is the statistics
data source's regionID
column, and the target ("dest_column"
) is the cells
data source's cell_regionID
column.
When a new filter (selection) is applied on a statistics
scatterplot, a valueset
parameter containing a Set
of unique regionID
values from the statistics
data source is passed to the cells
data source. This results in corresponding filtering on any charts displayed in cells
.
To add this functionality to an MDVProject
'p'
in Python, the following code:
valueset_args = {
'source_column': "regionID",
'dest_column': "cell_regionID",
}
p.insert_link('statistics', 'cells', 'valueset', valueset_args)
Will add this config in the statistics
datasource:
{
"name": "statistics",
"links": {
"cells": {
"valueset": {
"source_column": "regionID",
"dest_column": "cell_regionID"
}
}
}
}
Listeners can be added/removed from the DataStore with the addListener
and removeListener
methods. The callback passed to the addListener will be invoked with the type of event and any data relevant to the event. Note that the first argument is not an event type, but an ID for the listener. Invoking addListener
with the same ID will replace the existing listener and care must be taken such that the ID precisely differentiates the subject.
const listenerId = ds.addListener(`${chartId} data change listener`, (type, data) => {
if (type === "data_changed") {
console.log("data changed in columns: ", data.columns);
}
});
ds.removeListener(listenerId);
data_changed
filtered - The DataStore has been filtered, no data is passed to the listener unless all filters have been removed, in which case 'all_removed' is passed. << that's a lie.
column_removed - called when a column is removed passing the column's field (id)
data_highlighted
This is achieved by getting a dimension from the datastore, then calling the filter() on the dimension, specifying the filter method, columns and arguments
This has two filtering methods filterCategories works on a single column whereas filterMultipleCategories works on multiple columns
const dim = datastore.getDimension("category_dimension");
//filter items with red in color column
dim.filter("filterCategories", ["color"], "blue");
dim.removeFilter();
//filter items that are red or blue
dim.filter("filterCategories", ["color"], ["red", "blue"]);
dim.removeFilter();
//filter items that are red and have a shape value of square
dim.filter("filterMultipleCategories", ["color", "shape"], ["red", "square"])
const dim = datastore.getDimension("range_dimension");
//filter on between 5 and 10
dim.filter("filterRange", ["x"], {min: 5, min: 10});
dim.removeFilter();
//filter on x and y (a square)
dim.filter("filterSquare", ["x", "y"], {
range1: [5, 10],
range2: [5, 10]
});
dim.removeFilter();
//filter based on a polygon- just supply the points
dim.filter("filterPoly", ["x", "y"], [[0, 0], [2, 0], [1, 2]])
All dimensions can filter on an arbitrary set of items using the item's index
const indexes = new Set([4, 5, 10, 12]);
const dim = datastore.getDimension("range_dimension");
dim.filter("filterOnIndex", null, indexes)
When a filter is called on a dimension - the datastore informs all listeners and hence any updates are performed, which is likely to be expensive. Therefore, if you want to perform multiple filters, pass false as the fourth argument and then call triggerFilter on the datastore
rangeDim.filter("filterRange", ["x"], {min: 5, min: 10}, false);
catDim.filter("filterCategories", ["color"], "blue", false);
datastore.triggerFilter();