### 11 The Extensible n-Dimensional-Data Format

The most-common structure for data that are not instrument specific is what has become known as the bulk-data frame. To avoid confusion with the Interim Environment’s BDF, the new Starlink standard for storing bulk-data frames is called the Extensible $n$-Dimensional-Data Format (NDF for short). It has no specific HDS NAME because a container file may have several $<$NDF$>$ structures at a given level. It has an optional TYPE of $<$NDF$>$ that will not be tested by general-purpose applications but is recommended to assist recognition by human readers of structure listings. NDFs may be structured recursively—see the polarimetry example below, for example.

It was not, in fact, possible to keep strictly to the rules in Section 13 when designing the NDF structure; compromises were necessary in order to allow old Asterix and Wright-Giddings-formatted data, of which there is a great deal, to be processed by the new general-purpose applications. The NDF structure comprises a title, a data array and its associated objects (a [DATA_ARRAY] structure in the Wright-Giddings terminology), axis information, history and one or more registered named objects containing application-specific components. Note that everything at the top level is intended to be under Starlink control, and although general-purpose applications will (for an initial period) tolerate non-standard components at this level, such rogue objects will not be processed beyond being copied to the same place within an output structure. This Starlink-components-only restriction, which does not preclude extensibility (done through the MORE objects), simplifies the job of applications, relieving them of the responsibility of keeping track of arbitrary numbers of extra objects. It is recommended that if an application detects the presence of a rogue object it should display a warning message, to alert the user to take some action (for example to run the appropriate format conversion utility).

Table 24: Components of the Extensible $n$-Dimensional-Data structure
 Component Name TYPE Brief Description [VARIANT] $<$_CHAR$>$ variant of the $<$NDF$>$ type [TITLE] $<$_CHAR$>$ title of $<$NDF$>$ [DATA_ARRAY] $<$various$>$ NAXIS-dimensional data array [LABEL] $<$_CHAR$>$ label describing the data array [UNITS] $<$_CHAR$>$ units of the data array [VARIANCE] $<$s_array$>$ variance of the data array [BAD_PIXEL] $<$_LOGICAL$>$ bad pixel flag [QUALITY] $<$various$>$ quality of the data array [AXIS(NAXIS)] $<$AXIS$>$ axis values, labels, units and errors [HISTORY] $<$HISTORY$>$ history structure [MORE] $<$EXT$>$ extension structure

[VARIANT]
Specifies which sort of $<$NDF$>$ structure. The variant must be one of the registered strings, of which only ‘SIMPLE’ is currently available.
[TITLE]
A title for the data which may be used to annotate plots and listings, and which will help identify the NDF. (A single line of text will obviously be too brief to describe the contents of a dataset in detail, but will be useful for display purposes.)
[DATA_ARRAY]
This is the primary $n$-dimensional array of data values. It is the ONLY obligatory structure element. The [DATA_ARRAY] can be present in one of these forms:
(1)
A $<$narray$>$. This primitive form, i.e. just a $<$numeric$>$ array of numbers, is available to give compatibility with the old Wright-Giddings proposals. Although its use is discouraged for new applications, it is recommended that general-purpose applications propagate the $<$narray$>$ format, as input, rather than convert to an $<$ARRAY$>$ structure.
(2)
A $<$c_array$>$.
[LABEL]
This is a textual description for the kind of quantity stored in the [DATA_ARRAY] array.
[UNITS]
This is a textual description for the units in which the data values are given. If more than one NDF is being processed, the various [UNITS] text may be tested for equality. Should they prove unequal, the application must inform the user, who then may have an opportunity to permit processing to continue; however, [UNITS] would not under these circumstances be propagated to the output NDF if any.
[VARIANCE]
This is used to store the variance of the errors associated with [DATA_ARRAY]. It is used for computing symmetric error bars. The array dimensions must correspond to those of the [DATA_ARRAY] component. If all values in the data array have the same error, this can be represented by the scalar option. Other, more complex, forms of error representation (e.g. asymmetric errors) can be stored in specialised extensions, yet to be defined.
[BAD_PIXEL]
If this is false, applications may assume that [DATA_ARRAY] and [VARIANCE] contain no magic-value pixels. If it is either true or absent, applications must either test for magic-value pixels or—if incapable of performing bad-pixel processing—give up.
[QUALITY]
The data-quality values for the corresponding elements of [DATA_ARRAY]. Its TYPE is either $<$narray$>$ or $<$QUALITY$>$. Note that the array can be stored in a sparse variant $<$ARRAY$>$ structure; however, the dimensions of the sparse array must correspond to those of the [DATA_ARRAY] component, and the actual or equivalent primitive type for the data-quality values must be $<$_UBYTE$>$. The $<$narray$>$ option is to allow compatability with existing data in Wright-Giddings format; there was no [BADBITS] flag in the Wright-Giddings format, and so when such existing data are processed by general-purpose applications non-zero data quality will not be interpreted as bad pixels as would have occurred formerly.
NAXIS
The dimensionality of the [DATA_ARRAY], and therefore the number of elements (structures) in the AXIS array (of structures).
[AXIS]
This is an array of $<$AXIS$>$ structures, where [AXIS($n$)] corresponds to the ${n}^{th}$ dimension of the [DATA_ARRAY]. If [AXIS] is not present the pixel index is used, starting from the associated value(s) of [DATA_ARRAY.ORIGIN], or 1 if the origin data object does not exist. If a simple pixel index is required, then the [AXIS] should be omitted from the $<$NDF$>$.
[MORE]
This is a wrapper containing extensions; it is not itself an extension. The extensions and their components are outside the scope of the NDF definition, and they will be defined separately and in many cases will belong to specific applications packages. Each extension must have a unique NAME, by which it is recognised. Its TYPE may be any one of the Starlink-defined standard TYPEs, or may a new one defined according to rules in Section 13. Each extension (with the NAME, TYPE and variants) must be registered with the Starlink Head of Applications. Further NDFs may be located within these structures, and these may in turn contain extensions. To reduce the task of registration, and to minimise the risk of clashes, it is strongly recommended that one structure per application package be used rather than multiple minor items. It is also recommended that hierarchical structuring be used within extensions (rather than just ‘flat’ lists of components) so as to group related data objects, e.g. by processing or instrument.

Notes:

(1)
Locating the data array.

General-purpose applications expecting an $<$NDF$>$ structure should be prepared to process the data array of Wright-Giddings formats as well. Also, it should not matter in either case whether the name of the structure containing the data array or the name of the data array itself is supplied by the user. However, only when the name of the $<$NDF$>$ structure is given can other data objects in the NDF be processed, because of the no-tree-walking rule. An outline algorithm to achieve the required functionality is:

 Giv en nam e o f o bje ct

 Find its type

 if (type not primitive) then

 if (type not $<$c_array$>$) then

 if (type not $<$NDF$>$)then

 issue warning but proceed

 endif

 look for [DATA_ARRAY]

 if ([DATA_ARRAY] not found) then

 No data processed

 Exit

 else

 Search for other required items

 endif

 endif

 endif

 Process [DATA_ARRAY].

(2)
Accessing part of a [DATA_ARRAY]

Some general-purpose applications will need to be able to access subsets of a data array. The problem is twofold: first, the method of implementation needs to be specified, and second, the representation of each axis must be identified. An example is a general image-display routine which expects to be supplied a two-dimensional image but which is instead given a three-dimensional data cube. Such an application must have a means to select the whole or part of a slice from the cube. One method is simply to use two applications one after the other: first run MANIC (a KAPPA application) on the input data array to create a new dataset containing the required data; and second, run the required processing application on those extracted data. However, this means extra work for the user, and extra scratch space requirements, and in the case of frequently-used applications it will be more natural to provide the necessary ‘slicing’ capability directly. In these cases, applications will be able to exploit MANIC’s component subroutines, which will first obtain the parameter values to specify the data subset required, and then extract the subset efficiently and store it in internal workspace ready for processing. Through the applications interface file, it will be possible to set up default parameter values tailored to the application concerned. When the selection of axes is being made (specifying in what direction the 2-D cut through the 3-D data cube is to be made, for example), the application should display to the user the axis labels (if present) to assist identification.

(3)
Higher-level structures

Various specialised data objects and structures may be packaged around the NDF structure, using the NDF as a building block. One common requirement is for a series of related spectra or pictures; this could be implemented simply as a sequence of NDFs as follows:

 na me sp eci al_ty pe

 [name1] $<$NDF$>$

 $⋮$ $⋮$

 [name2] $<$NDF$>$

 $⋮$ $⋮$

 [name3] $<$NDF$>$

 $⋮$ $⋮$

Another approach would be to use an HDS array, each element of which is an NDF.

(4)
Merging two or more $<$NDF$>$ structures

The merging of history records has been discussed in Section 10.6, and the same approach is followed for other data objects within an $<$NDF$>$. Thus, cases are divided into (i) those with a principal data array, where only the components of its $<$NDF$>$ structure are processed/copied to an output array, and (ii) those where the data arrays have equal importance, and the application, by convention, assumes the first $<$NDF$>$ supplied contains the principal data array. There will be an HDS editor and $<$NDF$>$ “dressing/undressing” utilities when this is not satisfactory. It is suggested that a common ADAM parameter name be assigned to this ‘principal’ $<$NDF$>$, e.g. MAIN_ARRAY.

#### 11.1 Polarimetry Example

Stokes parameters are the most common method for storing and analysis of polarimetric data. Here is an illustrative example of how they might be stored using the $<$NDF$>$ structure, taking the approach that the $I$ data is the principal data array, and is therefore stored at the top-level of the structure. The $Q$, $U$ and $V$-parameter data are $<$NDF$>$ structures called, respectively, [STOKES_Q], [STOKES_U] and [STOKES_V], and located within a polarimetry extension.

The obvious alternative approach would be simply to add to the [DATA_ARRAY] an extra dimension so that the different Stokes parameters could all be stored in a single data array. Thus, for example, the four Stokes pictures from a 512$×$512 imaging polarimeter would be stored as different planes of a 4$×$512$×$512 data cube. Though superficially more elegant than using separate arrays for each Stokes parameter, such an approach would introduce the danger of invalid processing, because the Stokes parameters are intrinsically different from each other; they cannot be combined (for example adding a $Q$ pixel value to its $V$ value would be meaningless) whereas analogous arithmetic between values in the spatial time and wavelength/energy dimensions (for example rebinning) would, of course, be valid.

Table 25: Example Polarimetry extension.
 Component Name TYPE Brief Description [STOKES_Q] $<$NDF$>$ Stokes $Q$ data objects [STOKES_U] $<$NDF$>$ Stokes $U$ data objects [STOKES_V] $<$NDF$>$ Stokes $V$ data objects

Table 26: A polarimetry example using the $<$NDF$>$
 Component Name TYPE Brief Description [TITLE] $<$_CHAR$>$ title of $<$NDF$>$ [DATA_ARRAY] $<$various$>$ Stokes $I$ data array [LABEL] $<$_CHAR$>$ label describing the data array [UNITS] $<$_CHAR$>$ units of the data array [VARIANCE] $<$s_array$>$ variance of the data array [QUALITY] $<$various$>$ quality of the data array [AXIS(NAXIS)] $<$AXIS$>$ axis values, labels, units and errors applicable to all Stokes parameters [HISTORY] $<$HISTORY$>$ history structure [MORE] $<$EXT$>$ extension structure [.POLARIMETRY] $<$EXT$>$ polarimetry extension

Usually, the different Stokes parameters will have the same axis information and, using the structure above, specialist polarimetry applications will be able to exploit this fact. However, general-purpose applications will not be able to do so, because of the rule on “tree-walking”. To obtain other than default axis data using a general-purpose application, say displaying [STOKES_Q.DATA_ARRAY], the axis information must be duplicated in the [STOKES_Q] structure; alternatively, application should have a parameter which specifies where the axis information is to be found. If the default is taken, the application should look for the $<$AXIS$>$ structure in the normal place, i.e. within [STOKES_Q.]

#### 11.2 Simplified $<$NDF$>$ Structure

The support software associated with the standard data structures described above will take a long time to develop. In the meantime, some astronomers and programmers will want to convert their applications to ADAM and to start using HDS data structures. Therefore, a simple and limited form of the $<$NDF$>$ structure is available for their use. It will be comprehensible to the standard interfaces once they are ready, so that existing applications would then require minor modification, but the data files would not.

Table 27: Components of the Simple Extensible $n$-Dimensional-Data structure
 Component Name TYPE Brief Description [TITLE] $<$_CHAR$>$ title of [DATA_ARRAY] [DATA_ARRAY] $<$narray$>$ NAXIS-dimensional data array [AXIS(NAXIS)] $<$AXIS$>$ axis values, labels, units and errors

[TITLE]
As described in the full $<$NDF$>$.
[DATA_ARRAY]
This is the primary $n$-dimensional array of data values. It is the ONLY obligatory structure element. Note it can only be an array of numbers in the simplified $<$NDF$>$.
NAXIS
No change from the standard $<$NDF$>$.
[AXIS]
Note these are simplified $<$AXIS$>$ structures. If they are not present, the pixel coordinates are used for the axis arrays, which have the same dimensions as the [DATA_ARRAY]

Table 28: Components of the Simplified $<$AXIS$>$ structure element
 Component Name TYPE Brief Description [DATA_ARRAY] $<$narray$>$ axis value at each pixel [LABEL] $<$_CHAR$>$ axis label [UNITS] $<$_CHAR$>$ axis units

[DATA_ARRAY]
Note this can only be an array of numbers. Mandatory.
[LABEL]
No change from the normal $<$AXIS$>$ structure.
[UNITS]
No change.

Applications must test pixels for magic values—there is no [BAD_PIXEL] flag in the simplified $<$NDF$>$.