Data

class glue.core.data.Data(label='', coords=None, **kwargs)[source]

Bases: object

The basic data container in Glue.

The data object stores data as a collection of Component objects. Each component stored in a dataset must have the same shape.

Catalog data sets are stored such that each column is a distinct 1-dimensional Component.

There are several ways to extract the actual numerical data stored in a Data object:

data = Data(x=[1, 2, 3], label='data')
xid = data.id['x']

data[xid]
data.get_component(xid).data
data['x']  # if 'x' is a unique component name

Likewise, datasets support fancy indexing:

data[xid, 0:2]
data[xid, [True, False, True]]

See also: Working with Data objects

Parameters:label (str) – label for data

Extra array-like keywords are extracted into components

Attributes Summary

components All ComponentIDs in the Data
coordinate_components The ComponentIDs associated with a CoordinateComponent
coordinate_links A list of the ComponentLinks that connect pixel and world.
derived_components The ComponentIDs for each DerivedComponent
label Convenience access to data set’s label
ndim Dimensionality of the dataset
pixel_component_ids The ComponentIDs for each pixel coordinate.
primary_components The ComponentIDs not associated with a DerivedComponent
shape Tuple of array dimensions, like numpy.ndarray.shape
size Total number of elements in the dataset.
subsets Tuple of subsets attached to this dataset
visible_components ComponentIDs for all non-hidden components.
world_component_ids The ComponentIDs for each world coordinate.

Methods Summary

add_component(component, label[, hidden]) Add a new component to this data set.
add_component_link(link[, label]) Shortcut method for generating a new DerivedComponent from a ComponentLink object, and adding it to a data set.
add_subset(subset) Assign a pre-existing subset to this data object.
broadcast(attribute) Send a DataUpdateMessage to the hub
component_ids() Equivalent to Data.components
dtype(cid) Lookup the dtype for the data associated with a ComponentID
find_component_id(label) Retrieve component_ids associated by label name.
get_component(component_id) Fetch the component corresponding to component_id.
get_pixel_component_id(axis) Return the pixel glue.core.component_id.ComponentID associated with a given axis
get_world_component_id(axis) Return the world glue.core.component_id.ComponentID associated with a given axis
join_on_key(other, cid, cid_other) Create an element mapping to another dataset, by joining on values of ComponentIDs in both datasets.
new_subset([subset, color, label]) Create a new subset, and attach to self.
register_to_hub(hub) Connect to a hub.
remove_component(component_id) Remove a component from a data set
to_dataframe([index]) Convert the Data object into a pandas.DataFrame object
update_components(mapping) Change the numerical data associated with some of the Components in this Data object.
update_id(old, new) Reassign a component to a different glue.core.component_id.ComponentID
update_values_from_data(data) Replace numerical values in data to match values from another dataset.

Attributes Documentation

components[source]

All ComponentIDs in the Data

Return type:list
coordinate_components[source]

The ComponentIDs associated with a CoordinateComponent

Return type:list

A list of the ComponentLinks that connect pixel and world. If no coordinate transformation object is present, return an empty list.

derived_components[source]

The ComponentIDs for each DerivedComponent

Return type:list
label[source]

Convenience access to data set’s label

ndim[source]

Dimensionality of the dataset

pixel_component_ids[source]

The ComponentIDs for each pixel coordinate.

primary_components[source]

The ComponentIDs not associated with a DerivedComponent

Return type:list
shape[source]

Tuple of array dimensions, like numpy.ndarray.shape

size[source]

Total number of elements in the dataset.

subsets[source]

Tuple of subsets attached to this dataset

visible_components[source]

ComponentIDs for all non-hidden components.

Return type:list
world_component_ids[source]

The ComponentIDs for each world coordinate.

Methods Documentation

add_component(component, label, hidden=False)[source]

Add a new component to this data set.

Parameters:
Raises:

TypeError, if label is invalid ValueError if the component has an incompatible shape

Returns:

The ComponentID associated with the newly-added component

Shortcut method for generating a new DerivedComponent from a ComponentLink object, and adding it to a data set.

Parameters:
Returns:

The DerivedComponent that was added

add_subset(subset)[source]

Assign a pre-existing subset to this data object.

Parameters:subset – A Subset or SubsetState object

If input is a SubsetState, it will be wrapped in a new Subset automatically

Note

The preferred way for creating subsets is via new_subset_group(). Manually-instantiated subsets will not be represented properly by the UI

broadcast(attribute)[source]

Send a DataUpdateMessage to the hub

Parameters:attribute (string) – Name of an attribute that has changed (or None)
component_ids()[source]

Equivalent to Data.components

dtype(cid)[source]

Lookup the dtype for the data associated with a ComponentID

find_component_id(label)[source]

Retrieve component_ids associated by label name.

Parameters:label – ComponentID or string to search for
Returns:The associated ComponentID if label is found and unique, else None. First, this checks whether the component ID is present and unique in the primary (non-derived) components of the data, and if not then the derived components are checked. If there is one instance of the label in the primary and one in the derived components, the primary one takes precedence.
get_component(component_id)[source]

Fetch the component corresponding to component_id.

Parameters:component_id – the component_id to retrieve
get_pixel_component_id(axis)[source]

Return the pixel glue.core.component_id.ComponentID associated with a given axis

get_world_component_id(axis)[source]

Return the world glue.core.component_id.ComponentID associated with a given axis

join_on_key(other, cid, cid_other)[source]

Create an element mapping to another dataset, by joining on values of ComponentIDs in both datasets.

This join allows any subsets defined on other to be propagated to self. The different ways to call this method are described in the Examples section below.

Parameters:

other : Data

Data object to join with

cid : str or ComponentID or iterable

Component(s) in this dataset to use as a key

cid_other : str or ComponentID or iterable

Component(s) in the other dataset to use as a key

Examples

There are several ways to use this function, depending on how many components are passed to cid and cid_other.

Joining on single components

First, one can specify a single component ID for both cid and cid_other: this is the standard mode, and joins one component from one dataset to the other:

>>> d1 = Data(x=[1, 2, 3, 4, 5], k1=[0, 0, 1, 1, 2], label='d1')
>>> d2 = Data(y=[2, 4, 5, 8, 4], k2=[1, 3, 1, 2, 3], label='d2')
>>> d2.join_on_key(d1, 'k2', 'k1')

Selecting all values in d1 where x is greater than 2 returns the last three items as expected:

>>> s = d1.new_subset()
>>> s.subset_state = d1.id['x'] > 2
>>> s.to_mask()
array([False, False,  True,  True,  True], dtype=bool)

The linking was done between k1 and k2, and the values of k1 for the last three items are 1 and 2 - this means that the first, third, and fourth item in d2 will then get selected, since k2 has a value of either 1 or 2 for these items.

>>> s = d2.new_subset()
>>> s.subset_state = d1.id['x'] > 2
>>> s.to_mask()
array([ True, False,  True,  True, False], dtype=bool)

Joining on multiple components

Note

This mode is currently slow, and will be optimized significantly in future.

Next, one can specify several components for each dataset: in this case, the number of components given should match for both datasets. This causes items in both datasets to be linked when (and only when) the set of keys match between the two datasets:

>>> d1 = Data(x=[1, 2, 3, 5, 5],
...           y=[0, 0, 1, 1, 2], label='d1')
>>> d2 = Data(a=[2, 5, 5, 8, 4],
...           b=[1, 3, 2, 2, 3], label='d2')
>>> d2.join_on_key(d1, ('a', 'b'), ('x', 'y'))

Selecting all items where x is 5 in d1 in which x is a component works as expected and selects the two last items:

>>> s = d1.new_subset()
>>> s.subset_state = d1.id['x'] == 5
>>> s.to_mask()
array([False, False, False,  True,  True], dtype=bool)

If we apply this selection to d2, only items where a is 5 and b is 2 will be selected:

>>> s = d2.new_subset()
>>> s.subset_state = d1.id['x'] == 5
>>> s.to_mask()
array([False, False,  True, False, False], dtype=bool)

and in particular, the second item (where a is 5 and b is 3) is not selected.

One-to-many and many-to-one joining

Finally, you can specify one component in one dataset and multiple ones in the other. In the case where one component is specified for this dataset and multiple ones for the other dataset, then when an item is selected in the other dataset, it will cause any item in the present dataset which matches any of the keys in the other data to be selected:

>>> d1 = Data(x=[1, 2, 3], label='d1')
>>> d2 = Data(a=[1, 1, 2],
...           b=[2, 3, 3], label='d2')
>>> d1.join_on_key(d2, 'x', ('a', 'b'))

In this case, if we select all items in d2 where a is 2, this will select the third item:

>>> s = d2.new_subset()
>>> s.subset_state = d2.id['a'] == 2
>>> s.to_mask()
array([False, False,  True], dtype=bool)

Since we have joined the datasets using both a and b, we select all items in d1 where x is either the value or a or b (2 or 3) which means we select the second and third item:

>>> s = d1.new_subset()
>>> s.subset_state = d2.id['a'] == 2
>>> s.to_mask()
array([False,  True,  True], dtype=bool)

We can also join the datasets the other way around:

>>> d1 = Data(x=[1, 2, 3], label='d1')
>>> d2 = Data(a=[1, 1, 2],
...           b=[2, 3, 3], label='d2')
>>> d2.join_on_key(d1, ('a', 'b'), 'x')

In this case, selecting items in d1 where x is 1 selects the first item, as expected:

>>> s = d1.new_subset()
>>> s.subset_state = d1.id['x'] == 1
>>> s.to_mask()
array([ True, False, False], dtype=bool)

This then causes any item in d2 where either a or b are 1 to be selected, i.e. the first two items:

>>> s = d2.new_subset()
>>> s.subset_state = d1.id['x'] == 1
>>> s.to_mask()
array([ True,  True, False], dtype=bool)
new_subset(subset=None, color=None, label=None, **kwargs)[source]

Create a new subset, and attach to self.

Note

The preferred way for creating subsets is via new_subset_group(). Manually-instantiated subsets will not be represented properly by the UI

Parameters:subset – optional, reference subset or subset state. If provided, the new subset will copy the logic of this subset.
Returns:The new subset object
register_to_hub(hub)[source]

Connect to a hub.

This method usually doesn’t have to be called directly, as DataCollections manage the registration of data objects

remove_component(component_id)[source]

Remove a component from a data set

Parameters:component_id (ComponentID) – the component to remove
to_dataframe(index=None)[source]

Convert the Data object into a pandas.DataFrame object

Parameters:index – Any ‘index-like’ object that can be passed to the pandas.Series constructor
Returns:pandas.DataFrame
update_components(mapping)[source]

Change the numerical data associated with some of the Components in this Data object.

All changes to component numerical data should use this method, which broadcasts the state change to the appropriate places.

Parameters:mapping – A dict mapping Components or ComponenIDs to arrays.
This method has the following restrictions:
  • New compoments must have the same shape as old compoments
  • Component subclasses cannot be updated.
update_id(old, new)[source]

Reassign a component to a different glue.core.component_id.ComponentID

Parameters:

old : glue.core.component_id.ComponentID

The old component ID

new : glue.core.component_id.ComponentID

The new component ID

update_values_from_data(data)[source]

Replace numerical values in data to match values from another dataset.

Notes

This method drops components that aren’t present in the new data, and adds components that are in the new data that were not in the original data. The matching is done by component label, and components are resized if needed. This means that for components with matching labels in the original and new data, the ComponentID are preserved, and existing plots and selections will be updated to reflect the new values. Note that the coordinates are also copied, but the style is not copied.