Data Classes

All emdfile data classes inherit from the Node class, which adds core tree-building and metadata storage functionality. Each additionally has its own data and built-in metadata interface.

Array

class emdfile.Array(data: ndarray, name: str | None = 'array', units: str | None = '', dims: list | None = None, dim_names: list | None = None, dim_units: list | None = None, slicelabels=None)

Array instances store N-dimensional array-like data.

__init__(data: ndarray, name: str | None = 'array', units: str | None = '', dims: list | None = None, dim_names: list | None = None, dim_units: list | None = None, slicelabels=None)
Parameters:
  • data (np.ndarray)

  • name (str)

  • units (str) – units for the pixel values

  • dims (variable) – specify calibration vectors for each axis of the data array. Valid values for each element of the list are None, a number, a 2-element list/array, or an M-element list/array where M is the extent of the corresponding array dimension. If None is passed, the dim will be populated with integer values starting at 0 and its units will be set to pixels. If a number is passed, the dim is populated with a vector beginning at zero and increasing linearly by this step size. If a 2-element list/array is passed, the dim is populated with a linear vector with these two numbers as the first two elements. If a list/array of length M is passed, this is used as the dim vector. If dims recieves a list of fewer than N arguments for an N-dimensional data array, the extra dimensions are populated as if None were passed, using integer pixel values. If the dims parameter is not passed, all dim vectors are populated this way.

  • dim_units (list) – the units for the calibration dim vectors. If nothing is passed, dims vectors which have been populated automatically with integers corresponding to pixel numbers will be assigned units of ‘pixels’, and any other dim vectors will be assigned units of ‘unknown’. If a list with length < the array dimensions, the passed values are assumed to apply to the first N dimensions, and the remaining values are populated with ‘pixels’ or ‘unknown’ as above.

  • dim_names (list) – labels for each axis of the data array. Values which are not passed will be autopopulated with the name “dim#” where # is the axis number.

  • slicelabels (None or True or list) – if not None, array will be promoted to a stack array - see object docstring for details. If a list is passed it should specify the sub-array names.

Return type:

Array

dim(n)

Return the n’th dim vector

get_dim(n)

Return the n’th dim vector

get_dim_name(n)

Get the n’th dim vector name

get_dim_units(n)

Return the n’th dim vector units

set_dim(n: int, dim: list | ndarray, units: str | None = None, name: str | None = None)

Sets the n’th dim vector, using dim as described in the Array documentation. If units and/or name are passed, sets these values for the n’th dim vector.

Parameters:
  • n (int) – specifies which dim vector

  • dim (list or array) – length must be either 2, or match the length of the n’th axis

  • units (str)

  • name (str)

set_dim_name(n: int, name: str)

Sets the n’th dim vector name to name.

Parameters:
  • n (int) – which dim vector

  • name (str) – new name

set_dim_units(n: int, units: str)

Sets the n’th dim vector units to units.

Parameters:
  • n (int) – which dim vector

  • units (str) – new units

to_h5(group)

Calls Node.to_h5 to greate the group’s node and write its metadata. Then writes Array data, calibration vectors, units, and any stack/label info.

Parameters:

group (h5py Group)

Return type:

(h5py Group) the new array’s Group

PointList

class emdfile.PointList(data: ndarray, name: str | None = 'pointlist')

PointList instances represent sets of points in some M dimensional space. Each dimension is given by a named field and has its own dtype. See also the documentation for numpy structured arrays.

__init__(data: ndarray, name: str | None = 'pointlist')
Parameters:
  • data (structured numpy ndarray) – the data

  • name (str) – name for the PointList

Return type:

(PointList)

add(data)

Appends a numpy structured array. Its dtypes must agree with the existing data.

add_data_by_field(data, fields=None)

Add a list of data arrays to the PointList, in the fields given by fields. If fields is not specified, assumes the data arrays are in the same order as self.fields

Parameters:

data (list) – arrays of data to add to each field

add_fields(new_fields, name='')

Creates a copy of the PointList, but with additional fields given by new_fields.

Parameters:
  • new_fields (list of 2-tuples, ('name', dtype))

  • name (string)

copy(name=None)

Returns a copy of the PointList. If name=None, sets to {name}_copy

remove(mask)

Removes points wherever mask==True

sort(field, order='ascending')

Sorts the point list according to field, which must be a field in self.dtype. order should be ‘descending’ or ‘ascending’.

to_h5(group)

Calls Node.to_h5 to greate the group’s node and write its metadata. Then writes PointList data including the structured data array and field names and dtypes.

Parameters:

group (h5py Group)

Returns:

h5py Group

Return type:

the new pointlist’s group

PointListArray

class emdfile.PointListArray(dtype, shape, name: str | None = 'pointlistarray')

A PointListArray instance comprises a 2D grid of PointLists, each sharing a single dtype and set of fields, and each having any variable length. It therefore represents a “ragged array” in 2+1 dimensions, i.e. with two dimensions of a fixed shape and one of variable length, embedded in an M dimensional space for PointLists with M fields.

__init__(dtype, shape, name: str | None = 'pointlistarray')

Creates an empty PointListArray.

Parameters:
  • dtype (dtype) – the dtype of the data comprising each PointList

  • shape (2-tuple of ints) – the shape of the array of PointLists

  • name (str)

Return type:

(PointListArray)

add_fields(new_fields, name='')

Creates a copy of the PointListArray, but with additional fields given by new_fields.

Parameters:
  • new_fields (list of 2-tuples, ('name', dtype))

  • name (string)

copy(name='')

Returns a copy of itself.

get_pointlist(i, j, name=None)

Returns the pointlist at i,j

to_h5(group)

Calls Node.to_h5 to greate the group’s node and write its metadata. Then writes PointListArray data including the data itself, array shape and the dtype.

Parameters:

group (h5py Group)

Return type:

(h5py Group) the new pointlistarray’s group