Examples and Usage

Below are introductory examples on

See also the basic usage notebook in the repo sample-code folder.

Basics

After

>>> import emdfile as emd, numpy as np

save and read an array with

>>> ar = np.random.random((5,5))
>>> emd.save(path, ar)
>>> _ar = emd.read(path)

or a Python dictionary with

>>> dic = {'a':1, 'b':2}
>>> emd.save(path, dic)
>>> _dic = emd.read(path)

or save a combination of arrays ar_* and dic_* with

>>> emd.save(path, [ar_A,ar_B,ar_C,dic_X,dic_Y,dic_Z])

and read and unpack them with

>>> data = emd.read(path)
>>> _ar_A = data.tree('array_0')
>>> _ar_B = data.tree('array_1')
>>> _ar_C = data.tree('array_2')
>>> _dic_A = data.metadata['dictionary_0']
>>> _dic_B = data.metadata['dictionary_1']
>>> _dic_C = data.metadata['dictionary_2']

Trees

emdfile classes can be composed into filetree-like heirarchies. For class instances A, B, and C with names 'A', 'B' and 'C' build a tree with

>>> R = emd.Root()
>>> R.tree(A)
>>> R.tree(B)
>>> B.tree(C)

and display it with

>>> R.tree()

which prints

/
|---A
|---B
    |---C

Save the whole tree with

>>> emd.save(path, R)

then print the file contents with

>>> emd.printtree(path)

which prints

/
|---root
    |---A
    |---B
        |---C

Read the whole tree again with

>>> data = emd.read(path)

or read some subset with

>>> data = emd.read(path, emdpath='root/A')             # reads A
>>> data = emd.read(path, emdpath='root/B')             # reads B---C
>>> data = emd.read(path, emdpath='root/B', tree=False) # reads B only

Metadata

When you save a Python dictionary and read it again, you get an emd.Metadata instance

>>> emd.save(path, {'a':1,'b':2})
>>> x = emd.read(path)
>>> print(x)
Metadata( A Metadata instance called 'dictionary', containing the following fields:

      a:   1
      b:   2
)

You can access values like a normal Python dictionary

>>> x['a']
1

as well as add data

>>> x['c'] = 3

Nested dictionarys of any depth are premitted, as are various Python and numpy values. Doing

>>> m = emd.Metadata( name='my_metadata' )
>>> m['x'] = True
>>> m['y'] = np.random.random((3,4,5))
>>> m['z'] = {
>>>     'alpha' : None,
>>>     'beta' : {
>>>         'gamma' : [10,11,12]
>>>     }
>>> }
>>> emd.save(path, m)

saves a dictionary and

>>> _m = emd.read(path)

reads it again. Print its contents with

>> print(_m)
Metadata( A Metadata instance called 'my_metadata', containing the following fields:

      x:   True
      y:   3D-array
      z:   {'alpha': None, 'beta': {'gamma': [10, 11, 12]}}
)

Any number of Metadata instances can be stored in each emdfile node - see the Metadata and Node docstrings for more information.

Nodes

The Node class is the base class that all emdfile classes inherit from, allowing them to build and modify trees and store arbitrary metadata. Each node has a .name and .metadata attribute and a .tree method.

A node’s name is used to find it in data trees and to save it to files, and can be assigned during instantiation

>>> node = emd.Node( name='my_node' )

The .metadata property has unique assignment behavior to allow storing many Metadata instances in a given node. Doing

>>> node.metadata = emd.Metadata('md1',{'x':1,'y':2})
>>> node.metadata = emd.Metadata('md2',{'a':1,'b':{'c':2,'d':3}})

will store both Metadata instances md1 and md2 in node (and not overwrite one of them, as you would expect in normal Python assignment). You can return all the Metadata instances in a node with

>>> node.metadata
{'md1': Metadata( A Metadata instance called 'md1', containing the following fields:

           x:   1
           y:   2
 ),
 'md2': Metadata( A Metadata instance called 'md2', containing the following fields:

           a:   1
           b:   {'c': 2, 'd': 3}
 )}

and one of the Metadata instances can be retrieved by

>>> node.metadata['md1']
Metadata( A Metadata instance called 'md1', containing the following fields:

          x:   1
          y:   2
)

Basic EMD .tree usage for building and printing tree structures is shown above. Using .tree you can also retrieve any tree node, split one tree into two with the cut operation, or merge two trees into one with the graft operation. EMD trees must begin with a Root instance, a special Node subtype intended for this purpose. See the Node documentation.

Arrays

The Array class enables storage of array-like data. The minimal required argument to make a new instance is a numpy array

>>> array = emd.Array(np.random.random((3,3)))

The Array class also natively stores some self-descriptive metadata specifying the data and its coordinate system. Instantiate an Array instance with this calibrating metadata included with e.g.

>>> ar = emd.Array(
>>>     np.ones((20,40,1000)),
>>>     name = '3ddatacube',
>>>     units = 'intensity',
>>>     dims = [
>>>         [0,5],
>>>         [0,5],
>>>         [0,0.02],
>>>     ],
>>>     dim_units = [
>>>         'nm',
>>>         'nm',
>>>         'eV'
>>>     ],
>>>     dim_names = [
>>>         'x',
>>>         'y',
>>>         'E',
>>>     ],
>>> )

where dims generates vectors which calibrate each of the array’s axes. In the case above, the two numbers given (e.g. [0,5] for each of the first two dimensions) are linearly extrapolated, so the first dimension’s first 5 pixels correspond to the locations [0,5,10,15,20...]. Printing the array to standard output displays the calibration info

>>> print(array)
Array( A 3-dimensional array of shape (20, 40, 1000) called '3ddatacube',
       with dimensions:

           x = [0,5,10,...] nm
           y = [0,5,10,...] nm
           E = [0.0,0.02,0.04,...] eV
)

The dimension vectors, units, and names can all be retrieved or set after instantiation with various Array methods like

>>> ar.dims
>>> ar.get_dim
>>> ar.set_dim
>>> ar.set_dim_units
>>> ar.set_dim_name

See the Array docs for further discussion. Array instances have all the normal Node functionality like .metadata and .tree.

More Data Classes

In addition to Array, the normal data-containing classes include PointList for a set of points in some M dimensional space, and PointListArray for “ragged array”-like data, with 2+1 dimensional data currently supported. For instantiation and usage, see the PointList and PointListArray docstrings.

emdfile also includes a Custom class, designed for composition of the other class types into a single Node container. See the defining classes section below.

Append Mode

In addition to writing new files, emdfile allows appending new data to existing files. If we first write some tree

>>> root1 = emd.Root('root1')
>>> root1.tree( <add some data> )
>>> emd.save(path, root1)

and then later make a second tree of data

>>> root2 = emd.Root('root2')
>>> root2.tree( <add some other data> )

the second tree can be added to the same file using “append” mode

>>> emd.save(path, root2, mode='a')

The two trees will both be saved to the same file, each starting at their own root group just under the HDF5 root, provided that the Root instances have different names.

If we append to an existing file using a root with a name already in the file, emdfile will perform a diffmerge-like operation, i.e. it will compare the two trees, determine which nodes in the incoming tree are new and which already exist, and write the new nodes to the file. Already existing nodes will be skipped if mode='a', and overwritten if mode='ao'. Note that comparison happens at the level of node names: the contents of the nodes are not evaluated the the save function.

For example, if we make a tree and save it

>>> root = emd.Root( 'my_root' )
>>> ar1 = emd.Array(np.ones((5,5)),'array1')
>>> root.tree(ar1)
>>> emd.save(path, root)

then add more data later

>>> ar2 = emd.Array(np.zeros((3,3,3)),'array2')
>>> ar1.tree(ar2)

then we can grow the tree saved to the filesystem at path with

>>> emd.save(path, root, mode='a')

After the first write operation, the file tree will look like

my_root
  |---ar1

and after the second operation it will be

my_root
  |---ar1
        |---ar2

What if the data in ar1 is changed some time after its been written to file? E.g.

>>> ar1.data += np.random.rand((5,5))

In this case, this change will not be reflected in the file if we perform a normal append operation like

>>> emd.save(path, root, mode='a')

but will be reflected in the file if we perform an “append-over” operation, e.g.

>>> emd.save(path, root, mode='ao')

Note, however, that this append-over will overwrite every node appearing in both the runtime and filesystem trees (in this case, just 'ar1' and 'ar2'). Moreover, the system storage that’s been overwritten is not freed by this operation, so overwriting large data blocks is not recommended, unless followed up by re-packing the files, e.g. by subsequently copying then deleting the original file.

More targetted save operations - e.g. adding or overwriting a single node, or appending a specific tree branch downstream of a selected node - are also possible. See the save docs for more info.

Defining Classes

emdfile is designed for downstream integration, that is, you can build your own Python scripts, modules, and packages which import emdfile and use it to handle reading and writing operations. For more info, see the subclassing guidelines.