1.2.3.2. Rules for Storing Data Items in NeXus Files¶
This section describes the rules which apply for storing single data items.
Naming Conventions¶
Group and field names used within NeXus follow a naming convention described by the following rules:
- The names of NeXus group and field items must only contain a restricted set of characters. This set may be described by a regular expression syntax regular expression regular expression syntax, as described below.
- For the class names [1] of NeXus group items, the prefix NX is reserved. Thus all NeXus class names start with NX. The chapter titled NeXus: Reference Documentation lists the available NeXus class names as either base classes, application definitions, or contributed definitions.
[1] | The class name is the value assigned to the NX_class attribute of an HDF5 group in the NeXus data file. This class name is different than the name of the HDF5 group. This is important when not using the NAPI to either read or write the HDF5 data file. |
Regular expression pattern for NXDL group and field names
It is recommended that all group and field names contain only these characters:
- lower case letters
- digits
- “_” (underscore character)
and that they begin with a lower case letter. This is the regular expression used to check this recommendation.
1 | [a-z_][a-z\d_]*
|
The length should be limited to no more than 63 characters (imposed by the HDF5 rules for names).
It is recognized that some facilities will construct group and field names with upper case letters. NeXus data files with upper case characters in the group or field names might not be accepted by all software that reads NeXus data files. Hence, group and field names that do not pass the regular expression above but pass this expression (named validItemName in the XML Schema file: nxdl.xsd):
1 | [A-Za-z_][\w_]*
|
will be flagged as a warning during data file validation.
Use of underscore in descriptive names
Sometimes it is necessary to combine words in order to build a descriptive name for a field or a group. In such cases lowercase words are connected by underscores.
1 | number_of_lenses
|
For all fields, only names from the NeXus base class dictionaries should be used. If a field name or even a complete component is missing, please suggest the addition to the NIAC: The NeXus International Advisory Committee. The addition will usually be accepted provided it is not a duplication of an existing field and adequately documented.
Note
The NeXus base classes provide a comprehensive dictionary of terms that can be used for each class. The expected spelling and definition of each term is specified in the base classes. It is not required to provide all the terms specified in a base class. Terms with other names are permitted but might not be recognized by standard software. Rather than persist in using names not specified in the standard, please suggest additions to the NIAC: The NeXus International Advisory Committee.
The data stored in NeXus fields must be readback values. This means values as read from the detector, other hardware etc. There are occasions where it is sensible to store the target value the variable was supposed to have. In this case the target value is stored with a name built by appending _set to the normal NeXus field name. An example:
1 2 | temperature
temperature_set
|
The temperature field will hold the readback from the cryostat/furnace/whatever. The field temperature_set will hold the target value for the temperature as set by the experiment control software.
NeXus Array Storage Order¶
NeXus stores multi-dimensional arrays of physical values in C language storage order, where the last dimension is the fastest varying. This is the rule. Good reasons are required to deviate from this rule.
It is possible to store data in storage orders other than C language order.
As well it is possible to specify that the data needs to be converted first before being useful. Consider one situation, when data must be streamed to disk as fast as possible and conversion to C language storage order causes unnecessary latency. This case presents a good reason to make an exception to the standard rule.
Non C Storage Order¶
In order to indicate that the storage order is different from C storage order two additional data set attributes, offset and stride, have to be stored which together define the storage layout of the data. Offset and stride contain rank numbers according to the rank of the multidimensional data set. Offset describes the step to make when the dimension is multiplied by 1. Stride defines the step to make when incrementing the dimension. This is best explained by some examples.
Offset and Stride for 1 D data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | * raw data = 0 1 2 3 4 5 6 7 8 9
size[1] = { 10 } // assume uniform overall array dimensions
* default stride:
stride[1] = { 1 }
offset[1] = { 0 }
for i:
result[i]:
0 1 2 3 4 5 6 7 8 9
* reverse stride:
stride[1] = { -1 }
offset[1] = { 9 }
for i:
result[i]:
9 8 7 6 5 4 3 2 1 0
|
Offset and Stride for 2D Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | * raw data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
size[2] = { 4, 5 } // assume uniform overall array dimensions
* row major (C) stride:
stride[2] = { 5, 1 }
offset[2] = { 0, 0 }
for i:
for j:
result[i][j]:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
* column major (Fortran) stride:
stride[2] = { 1, 4 }
offset[2] = { 0, 0 }
for i:
for j:
result[i][j]:
0 4 8 12 16
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
* "crazy reverse" row major (C) stride:
stride[2] = { -5, -1 }
offset[2] = { 4, 5 }
for i:
for j:
result[i][j]:
19 18 17 16 15
14 13 12 11 10
9 8 7 6 5
4 3 2 1 0
|
Offset and Stride for 3D Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | * raw data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
size[3] = { 3, 4, 5 } // assume uniform overall array dimensions
* row major (C) stride:
stride[3] = { 20, 5, 1 }
offset[3] = { 0, 0, 0 }
for i:
for j:
for k:
result[i][j][k]:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
35 36 37 38 39
40 41 42 43 44
45 46 47 48 49
50 51 52 53 54
55 56 57 58 59
* column major (Fortran) stride:
stride[3] = { 1, 3, 12 }
offset[3] = { 0, 0, 0 }
for i:
for j:
for k:
result[i][j][k]:
0 12 24 36 48
3 15 27 39 51
6 18 30 42 54
9 21 33 45 57
1 13 25 37 49
4 16 28 40 52
7 19 31 43 55
10 22 34 46 58
2 14 26 38 50
5 17 29 41 53
8 20 32 44 56
11 23 35 47 59
|
NeXus Data Types¶
description | matching regular expression |
---|---|
integer | NX_INT(8|16|32|64) |
floating-point | NX_FLOAT(32|64) |
array | (\\[0-9\\])? |
valid item name | ^[A-Za-z_][A-Za-z0-9_]*$ |
valid class name | ^NX[A-Za-z0-9_]*$ |
NeXus supports numeric data as either integer or floating-point numbers. A number follows that indicates the number of bits in the word. The table above shows the regular expressions that matches the data type specifier.
- integers
NX_INT8
,NX_INT16
,NX_INT32
, orNX_INT64
- floating-point numbers
NX_FLOAT32
orNX_FLOAT64
- date / time stamps
NX_DATE_TIME
orISO8601
: Dates and times are specified using ISO-8601 standard definitions. Refer to NeXus dates and times.
- strings
NX_CHAR
: The preferred string representation is UTF-8. Both fixed-length strings and variable-length strings are valid. String arrays cannot be used where only a string is expected (title, start_time, end_time,NX_class
attribute,...). Fields or attributes requiring the use of string arrays will be clearly marked as such (like theNXdata
attribute auxiliary_signals).
- binary data
- Binary data is to be written as
UINT8
.
- images
- Binary image data is to be written using
UINT8
, the same as binary data, but with an accompanying image mime-type. If the data is text, the line terminator is[CR][LF]
.
NeXus dates and times¶
NeXus dates and times
should be stored using the ISO 8601 [2] format,
e.g. 1996-07-31T21:15:22+0600
.
The standard also allows for time intervals in fractional seconds
with 1 or more digits of precision.
This avoids confusion, e.g. between U.S. and European conventions,
and is appropriate for machine sorting.
[2] | ISO 8601: http://www.w3.org/TR/NOTE-datetime |
strftime() format specifiers for ISO-8601 time
%Y-%m-%dT%H:%M:%S%z
Note
Note that the T
appears literally in the string,
to indicate the beginning of the time element, as specified
in ISO 8601. It is common to use a space in place of the
T
, such as 1996-07-31 21:15:22+0600
.
While human-readable (and later allowed in a relaxed revision
of the standard), compatibility with libraries supporting
the ISO 8601 standard is not
assured with this substitution. The strftime()
format specifier for this is “%Y-%m-%d %H:%M:%S%z
”.
NeXus Data Units¶
Given the plethora of possible applications of NeXus, it is difficult to
define units to use. Therefore, the general rule is that you are free to
store data in any unit you find fit. However, any field must have a
units attribute which describes the units, Wherever possible, SI units are
preferred. NeXus units are written as a string attribute (NX_CHAR
)
and describe the engineering units. The string
should be appropriate for the value.
Values for the NeXus units must be specified in
a format compatible with Unidata UDunits [3]
Application definitions may specify units to be used for fields
using an enumeration
.
[3] | The UDunits
specification also includes instructions for derived units.
At present, the contents of NeXus units attributes
are not validated in data files. |
Storing Detectors¶
There are very different types of detectors out there. Storing their data
can be a challenge. As a general guide line: if the detector has some
well defined form, this should be reflected in the data file. A linear
detector becomes a linear array, a rectangular detector becomes an
array of size xsize
times ysize
.
Some detectors are so irregular that this
does not work. Then the detector data is stored as a linear array, with the
index being detector number till ndet
. Such detectors must be accompanied
by further arrays of length ndet
which give
azimuthal_angle, polar_angle and distance
for each detector.
If data from a time of flight (TOF) instrument must be described, then the
TOF dimension becomes the last dimension, for example an area detector of
xsize
vs. ysize
is stored with TOF as an array with dimensions
xsize, ysize,
ntof
.
Monitors are Special¶
Monitors, detectors that measure the properties
of the experimental probe rather than the probe’s interaction with the
sample, have a special place in NeXus files. Monitors are crucial to normalize data.
To emphasize their role, monitors are not stored in the
NXinstrument
hierarchy but on NXentry
level
in their own groups as there might be multiple monitors. Of special
importance is the monitor in a group called control
.
This is the main monitor against which the data has to be normalized.
This group also contains the counting control information,
i.e. counting mode, times, etc.
Monitor data may be multidimensional. Good examples are scan monitors where a monitor value per scan point is expected or time-of-flight monitors.
Find the plottable data¶
Simple plotting is one of the motivations for the NeXus standard. To implement simple plotting, a mechanism must exist to identify the default data for visualization (plotting) in any NeXus data file. Over its history the NIAC has agreed upon a method of applying metadata to identify the default plottable data. This metadata has always been specified as HDF attributes. With the evolution of the underlying file formats and the NeXus data standard, the method to identify the default plottable data has evolved, undergoing three distinct versions.
Consult the NeXus API section, which describes the routines available to program these operations. In the course of time, generic NeXus browsers will provide this functionality automatically.
For programmers who may encounter NeXus data files written using any of these methods, we present the algorithm for each method to find the default plottable data. It is recommended to start with the most recent method, Version 3, first.
Version 3¶
The third (current) method to identify the default plottable data is as follows:
Start at the top level of the NeXus data file (the root of the HDF5 hierarchy).
Pick the default NXentry group.
If the root has an attribute
default
, then its value is the name of theNXentry
group to be used. Otherwise, pick anyNXentry
group. This is trivial if there is only oneNXentry
group.Pick the default NXdata group.
Open the
NXentry
group selected above. If it has an attributedefault
, then its value is the name of theNXdata
group to be used. Otherwise, pick anyNXdata
group. This is trivial if there is only oneNXdata
group.
Pick the default plottable field (the signal data).
Open the
NXdata
group selected above. If it has an attributesignal
, then its value is the name of the field (dataset) to be plotted. If nosignal
attribute is not present on theNXdata
group, then proceed to try an older NeXus method to find the default plottable data.Pick the fields with the dimension scales (the axes).
If the same
NXdata
group has an attributeaxes
, then its value is a string (signal data is 1-D) or string array (signal data is 2-D or higher rank) naming the field in this group to be used as dimension scales of the default plottable data. The number of values given must be equal to the rank of the signal data. These are the abcissae of the plottable signal data.If no field is available to provide a dimension scale for a given dimension, then a “
.
” will be used in that position. In such cases, programmers are expected to use an integer sequence starting from 0 for each position along that dimension.Associate the dimension scales with each dimension of the plottable data.
For each field (its name is AXISNAME) in
axes
that provides a dimension scale, there will be anNXdata
group attributeAXISNAME_indices
which value is an .. integer or integer array with value of the dimensions of the signal data to which this dimension scale applies.If no
AXISNAME_indices
attribute is provided, a programmer is encouraged to make best efforts assuming the intent of thisNXdata
group to provide a default plot.It is possible there may be more than one
AXISNAME_indices
attribute with the same value or values. This indicates the possibilty of using alternate abcissae along this (these) dimension(s). The field named in theaxes
attribute indicates the intention of the data file writer as to which field should be used by default.
Plot the signal data, given axes and AXISNAME_indices.
When all the default
and signal
attributes are present, this
Python code will identify directly the default plottable data
(assuming a plot()
function has been defined by some code:
root = h5py.File(hdf5_file_name, "r")
default_nxentry_group_name = root.attrs["default"]
nxentry = root[default_nxentry_group_name]
default_nxdata_group_name = nxentry.attrs["default"]
nxdata = nxentry[default_nxdata_group_name]
signal_dataset_name = nxdata.attrs["signal"]
data = nxdata[signal_dataset_name]
plot(data)
Version 2¶
Tip
Try this method for older NeXus data files and Version 3 fails..
The second method to identify the default plottable data is as follows:
Start at the top level of the NeXus data file.
Loop through the groups with class
NXentry
until the next step succeeds.Open the NXentry group and loop through the subgroups with class
NXdata
until the next step succeeds.Open the NXdata group and loop through the fields for the one field with attribute
signal="1"
. Note: There should be only one field that matches.This is the default plottable data.
If there is no such
signal="1"
field, proceed to try an older NeXus method to find the default plottable data.- If this field has an attribute
axes
:- The
axes
attribute value contains a colon (or comma) delimited list (in the C-order of the data array) with the names of the dimension scales associated with the plottable data. Such as:axes="polar_angle:time_of_flight"
- Parse
axes
and open the datasets to describe your dimension scales
- The
- If this field has no attribute
axes
:- Search for datasets with attributes
axis=1
,axis=2
, etc. - These are the fields describing your axis. There may be
several fields for any axis, i.e. there may be multiple
fields with the attribute
axis=1
. Among them the field with the attributeprimary=1
is the preferred one. All others are alternative dimension scales.
- Search for datasets with attributes
- If this field has an attribute
Having found the default plottable data and its dimension scales: make the plot.
Version 1¶
Tip
Try this method for older NeXus data files.
The first method to identify the default plottable data is as follows:
Open the first top level NeXus group with class
NXentry
.Open the first NeXus group with class
NXdata
.Loop through NeXus fields in this group searching for the item with attribute
signal="1"
indicating this field has the plottable data.Search for the one-dimensional NeXus fields with attribute
primary=1
. These are the dimension scales to label the axes of each dimension of the data.Link each dimension scale to the respective data dimension by the
axis
attribute (axis=1
,axis=2
, ... up to the rank of the data).If necessary, close this
NXdata
group, search the nextNXdata
group, repeating steps 3 to 5.If necessary, close the
NXentry
group, search the nextNXentry
group, repeating steps 2 to 6.
Associating Multi Dimensional Data with Axis Data¶
NeXus allows for storage of multi dimensional arrays of data. It is this data that presents the most challenge for description. In most cases it is not sufficient to just have the indices into the array as a label for the dimensions of the data. Usually the information which physical value corresponds to an index into a dimension of the multi dimensional data set. To this purpose a means is needed to locate appropriate data arrays which describe what each dimension of a multi dimensional data set actually corresponds too. There is a standard HDF facility to do this: it is called dimension scales. Unfortunately, when NeXus was first designed, there was only one global namespace for dimension scales. Thus NeXus had to devise its own scheme for locating axis data which is described here. A side effect of the NeXus scheme is that it is possible to have multiple mappings of a given dimension to physical data. For example, a TOF data set can have the TOF dimension as raw TOF or as energy.
There are now three methods of associating each data dimension to its respective dimension scale. Only the first method is recommended now, the other two (older methods) are now discouraged.
- Associating plottable data using attributes applied to the NXdata group
- Associating plottable data by name using the axes attribute
- Associating plottable data by dimension number using the axis attribute
The recommended method uses the axes
attribute applied to the NXdata group
to specify the names of each
dimension scale.
A prerequisite is that the fields describing the axes of the plottable data
are stored together with the plottable data in the same NeXus group.
If this leads to data duplication, use links.
Associating plottable data using attributes applied to the NXdata group¶
Tip
Recommended: This is the “NIAC2014” method recommended for all new NeXus data files.
The default data to be plotted (and any associated axes) is specified using attributes attached to the NXdata group.
signal : | Defines the name of the default dataset in the NXdata group. A field of this name must exist (either as dataset or link to dataset). It is recommended to use this attribute
rather than adding a signal attribute to the dataset. [4]
The procedure to identify the default data to be plotted is quite simple.
Given any NeXus data file, any
|
---|---|
axes : | String array [5] that defines the independent data fields used in the default plot for all of the dimensions of the signal field. One entry is provided for every dimension in the signal field. The field(s) named as values (known as “axes”) of this attribute
must exist. An axis slice is specified using a field named
When no default axis is available for a particular dimension of the plottable data, use a ”.” in that position. See examples provided on the NeXus webpage ([6]). If there are no axes at all (such as with a stack of images), the axes attribute can be omitted. |
AXISNAME_indices : | |
---|---|
Each Integer array [5] that defines the indices of the signal field
(that field will be a multidimensional array)
which need to be used in the The first index of an array is Here, AXISNAME is to be replaced by the name of each
field described in the data_2d:NXdata
@signal="data"
@axes="time","pressure"
@time_indices=0
@pressure_indices=1
data: float[1000,20]
time: float[1000]
pressure: float[20]
This attribute is to be provided in all situations.
However, if the indices attributes are missing
(such as for data files written before this specification),
file readers are encouraged to make their best efforts
to plot the data.
Thus the implementation of the
|
[4] | Summary of the discussion at NIAC2014 to revise how to find default data: http://www.nexusformat.org/2014_How_to_find_default_data.html |
[5] | (1, 2) Note on array attributes: Attributes potentially containing multiple values (axes and _indices) are to be written as string or integer arrays, to avoid string parsing in reading applications. |
[6] | (1, 2) NIAC2014 proposition: http://www.nexusformat.org/2014_axes_and_uncertainties.html |
Examples¶
Several examples are provided to illustrate this method. More examples are available in the NeXus webpage ([6]).
simple 1-D data example showing how to identify the default data (counts vs. mr)
In the first example, storage of a 1-D data set (counts vs. mr) is described.
1 2 3 4 5 6 7 8 9 10 | datafile.hdf5:NeXus data file
@default="entry"
entry:NXentry
@default="data"
data:NXdata
@signal="counts"
@axes="mr"
@mr_indices=0
counts: float[100] --> the default dependent data
mr: float[100] --> the default independent data
|
2-D data example showing how to identify the default data and associated dimension scales
A 2-D data set, data as a function of time and pressure is described.
By default as indicated by the axes
attribute,
pressure is to be used.
The temperature array is described as a substitute for pressure
(so it replaces dimension 1
of data
as indicated by the
temperature_indices
attribute).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | datafile.hdf5:NeXus data file
@default="entry"
entry:NXentry
@default="data_2d"
data_2d:NXdata
@signal="data"
@axes="time","pressure"
@pressure_indices=1
@temperature_indices=1
@time_indices=0
data: float[1000,20]
pressure: float[20]
temperature: float[20]
time: float[1000]
|
Associating plottable data by name using the axes
attribute¶
Warning
Discouraged: See this method: Associating plottable data using attributes applied to the NXdata group.
This method defines an attribute of the data field
called axes.
The axes
attribute contains the names of
each dimension scale
as a colon (or comma) separated list in the order they appear in C.
For example:
denoting axes by name
1 2 3 4 5 6 7 | data:NXdata
time_of_flight = 1500.0 1502.0 1504.0 ...
polar_angle = 15.0 15.6 16.2 ...
some_other_angle = 0.0 0.0 2.0 ...
data = 5 7 14 ...
@axes = polar_angle:time_of_flight
@signal = 1
|
Associating plottable data by dimension number using the axis
attribute¶
Warning
Discouraged: See this method: Associating plottable data by name using the axes attribute
The original method defines an attribute of each dimension
scale field called axis.
It is an integer whose value is the number of
the dimension, in order of
fastest varying dimension.
That is, if the array being stored is data with elements
data[j][i]
in C and
data(i,j)
in Fortran, where i
is the
time-of-flight index and j
is
the polar angle index, the NXdata
group
would contain:
denoting axes by integer number
1 2 3 4 5 6 7 8 9 10 11 | data:NXdata
time_of_flight = 1500.0 1502.0 1504.0 ...
@axis = 1
@primary = 1
polar_angle = 15.0 15.6 16.2 ...
@axis = 2
@primary = 1
some_other_angle = 0.0 0.0 2.0 ...
@axis = 1
data = 5 7 14 ...
@signal = 1
|
The axis
attribute must
be defined for each dimension scale.
The primary
attribute is unique to this method.
There are limited circumstances in which more
than one dimension scale
for the same data dimension can be included in the same NXdata
group.
The most common is when the dimension scales are
the three components of an
(hkl) scan. In order to
handle this case, we have defined another attribute
of type integer called
primary
whose value determines the order
in which the scale is expected to be chosen for plotting, i.e.
- 1st choice:
primary=1
- 2nd choice:
primary=2
- etc.
If there is more than one scale with the same value of the axis
attribute, one
of them must have set primary=1
. Defining the primary
attribute for the other scales is optional.
Note
- The
primary
attribute can only be- used with the first method of defining
- dimension scales
- discussed above. In addition to the
signal
data, this group could contain a data set of the same rank and dimensions callederrors
containing the standard deviations of the data.