Chapter 10
HDS/NDF — The Data System

 10.1 HDS — Hierarchical data system
  10.1.1 Data objects
 10.2 NDF — Extensible n-dimensional data format
  10.2.1 The structure of an NDF object

10.1 HDS — Hierarchical data system

HDS — the Hierarchical Data System — is one of the most powerful features of ADAM. It is implemented as a set of subroutines which are of much more interest to the programmer than the user of the programs. Nevertheless, as a user it is necessary for you to know something of the system in order to make the best use of your data. HDS is about storing astronomical data in a compact, flexible and efficient way. It recognises that observations are often complex — possibly consisting of a data array (in 1, 2, 3, or even more dimensions), together with variable amounts of ancillary data — calibrations, errors, telescope and instrument information, observing conditions, and so on. The way HDS handles this complexity bears some similarities to the way VMS handles directories and files.


PICT

Figure 10.1: The relationship between VMS and HDS.


10.1.1 Data objects

HDS files are known as container files and by default have the extension ‘.SDF’. They contain data objects which will be referred to simply as objects when the context makes clear what sort of object it is. An object is an entity which contains data or other objects. This is the basis of the hierarchical nature of HDS and is analogous to the VMS concepts of file and directory — a directory can contain files and directories which can themselves contain files and directories and so on (Figure 10.1). An object possesses the following attributes:

HDS allows great freedom in specifying names and types, but standards have been laid down (see Section 10.2) to encourage portability of data and applications. Name:  An object is identified by its name. This must be unique within its own container object. This is in contrast to VMS where different files in the same directory may be distinguished by their version numbers. A name is written as a character string containing any printing characters. Spaces, tabs and so on are ignored and alphabetic characters are capitalised. There are no special rules governing the first character (i.e. it can be numeric). When referring to components of objects, the following syntax is used:

    A.B.C...

where C is a component of B, and B is a component of A, which is the top-level object in the container file. Some specific examples of names are given at the end of this section. Type:  The type of an object falls into one of two classes:

Structure objects contain other objects called components. Primitive objects contain only numeric, character, or logical values. Objects in the different classes will be referred to simply as structures and primitives, while the more general term object will refer to either a structure or a primitive. Structures are analogous to VMS directories — they can contain a part of the hierarchy below them. Primitives are analogous to VMS files — they are at the bottom of any branch of the structure.

The primitive types defined in HDS are shown in Table 10.1.





HDS Type VAX Fortran Type Length in Bits



_INTEGER INTEGER 32
_REAL REAL 32
_DOUBLE DOUBLE PRECISION 64
_LOGICAL LOGICAL 32
_CHAR[*N] CHARACTER*N 8*N



_UBYTE BYTE 8
_BYTE BYTE 8
_UWORD INTEGER*2 16
_WORD INTEGER*2 16




Table 10.1: The HDS primitive data types.

The first five of these types are referred to as standard data types. The _UBYTE type provides a value range of 0 to 255; the _UWORD type provides a value range of 0 to 65535. The others are as for Fortran 77. Examples of structure types are IMAGE, SPECTRUM, INSTR_RESP etc. Their names don’t begin with an underscore, so the system and the programmer can easily distinguish between primitives and structures. A type is written as a character string with the same rules as for name, except that an asterisk can only appear if the first character is an underscore (i.e. it is a primitive), and also a type can be blank. Shape:  Every object has a shape or dimensionality. This is described by an integer (the number of dimensions) and an integer array (the size of each dimension). A scalar, for example a single number, has by convention a dimensionality of zero, i.e. number of dimensions is 0. A vector has a dimensionality of 1, i.e. number of dimensions is 1 and the first element of the dimension array contains the size of the vector. An array refers to an object with 2 or more dimensions; currently a maximum of 7 dimensions are allowed. Objects may be referred to as scalar primitives or vector structures and so on. State:  The state of an object specifies whether or not its value is defined. In routines it is represented as a LOGICAL variable where .TRUE. means defined and .FALSE. means undefined. Group:  In order to access an object, it is first necessary to obtain a locator, a sort of pointer which can then be used to address the object. When the program no longer needs to access the object, the locator should be annulled. A locator is analogous to a Fortran logical unit number (but is actually a character variable, not an integer). Any number of locators can be active simultaneously. The group attribute is used to form an association between locators so that they can be annulled together. A group is written as a character string whose rules of formation are the same as for name. Value:  When an object is first created it contains no value, somewhat like an empty file. It must be given a value in a separate operation. A value can be a scalar, vector, or an array. The scalar or the elements of the vector or array must all be of the same type and can be primitives or structures. The rules for handling character values are the same as for Fortran 77, i.e. character values are padded with blanks or truncated from the right depending on the relative length of the program value and the object. Illustration:  To fix ideas, look at the example of an NDF data structure in Figure 8.2. The following notation is used to describe each object:

                      NAME(dimensions)  TYPE  VALUE

where ‘(dimensions)’ only appears when describing vectors or arrays. Each level down the hierarchy is indented.

Suppose an object with this structure were stored in a (container) file called EXAMPLE.SDF, then we can refer to components of this object by names such as:

      EXAMPLE.DATA_ARRAY            an array of type _REAL  
      EXAMPLE.QUALITY.BADBITS       an unsigned scalar of type _BYTE  
      EXAMPLE.MORE.FIGARO.TIME      a scalar of type _REAL

and so on.

10.2 NDF — Extensible n-dimensional data format

A major preoccupation of Starlink since its inception has been to design a data storage format which is both standard and yet which can accommodate most of the things which one might wish to store. (This is a weak point with most software environments in astronomical use at present.) One of the practical problems with unfettered HDS is that it is too flexible. The solution adopted, NDF (Extensible N-dimensional-Data Format), provides a more limited set of designs, but still implemented using HDS. It is described in awesome detail in SGP/38.

In essence, NDF defines a set of standard data objects. Not all of them must be present in an NDF object, but no others will be processed. Non-standard items are handled in a standard way by using self-contained extensions. There are defined locations for items such as the main data array, axes, title, units etc. The only mandatory item is the main data array; all other items are optional!

All this means that the user can be certain that no properly written application will mess up his data, and there is a very good chance that all the useful information will be properly used. (For the programmer, the huge advantage of this system is that he doesn’t need to know the details of the format at all! A comprehensive set of routines is available to access the standard components of an NDF. These are described in Section 21.2.1 and (more fully) in SUN/33.)

10.2.1 The structure of an NDF object

ADAM_EXAMPLES:EXAMPLE.SDF is file containing an NDF object which contains all the standard NDF components and also has a Figaro extension. Such a file is often referred to as an ‘NDF file’, or even as just an ‘NDF’. The structure of the file, as revealed by:

   ICL > TRACE ADAM_EXAMPLES:EXAMPLE

is shown in Figure 8.2.

 
EXAMPLE  <NDF>  
 
   DATA_ARRAY(856)  <_REAL>       *,0.2284551,-2.040089,  
                                  ... 820.8976,570.0729,*,449.574  
   TITLE          <_CHAR*30>      ’HR6259 - AAT fibre data’  
   LABEL          <_CHAR*20>      ’Flux’  
   UNITS          <_CHAR*20>      ’Counts/s’  
   QUALITY        <QUALITY>       {structure}  
      BADBITS        <_UBYTE>        1  
      QUALITY(856)   <_UBYTE>        1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,  
                                     ... 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0  
 
   VARIANCE(856)  <_REAL>         2.1,0.1713413,1.5301,34.38378,42.35531,  
                                  ... 615.6732,427.5547,353.9127,337.1805  
   AXIS(1)        <AXIS>          {structure}  
 
   Contents of AXIS(1)  
      DATA_ARRAY(856)  <_REAL>       3847.142,3847.672,3848.201,3848.731,  
                                     ... 4298.309,4298.838,4299.368,4299.897  
      LABEL          <_CHAR*20>      ’Wavelength’  
      UNITS          <_CHAR*20>      ’Angstroms’  
 
   HISTORY        <HISTORY>       {structure}  
      CREATED        <_CHAR*30>      ’1990-DEC-12 08:21:02.324’  
      CURRENT_RECORD  <_INTEGER>     3  
      RECORDS(10)    <HIST_REC>      {array of structures}  
 
      Contents of RECORDS(1)  
         TEXT           <_CHAR*40>      ’Extracted spectrum from fibre data.’  
         DATE           <_CHAR*25>      ’1990-DEC-19 08:43:03.08’  
         COMMAND        <_CHAR*30>      ’FIGARO V2.4 FINDSP command’  
 
 
   MORE           <EXT>           {structure}  
      FIGARO         <EXT>           {structure}  
         TIME           <_REAL>         1275  
         SECZ           <_REAL>         2.13

Figure 8.2: An example of the internal structure of an NDF file.
Of course, this is only an example format. There are various ways of representing some of the components. These variants are described in SGP/38.

The components of an NDF are described below. The names (in bold type) are significant as they are used by the NDF access routines to identify the components.

DATA_ARRAY
— the main data array is the only component which must be present in an NDF. In the case of EXAMPLE.SDF, this component is a 1-d real array with 856 elements.
TITLE
— the character string ’HR6259 - AAT fibre data’ describes the contents of the NDF. The TITLE might be used as the title of a graph etc.
LABEL
— the character string ’Flux’ describes the quantity represented in the NDF’s main data array. The LABEL is intended for use on the axes of graphs etc.
UNITS
— this character string describes the physical units of the quantity stored in the main data array, in this case, ’Counts/s’.
QUALITY
— this component is used to indicate the quality of each element in the main data array. The quality structure contains a quality array and a BADBITS value, both of which must be of type _UBYTE. The quality array has the same shape and size as the main data array, and is used in conjunction with the BADBITS value to decide the quality of a pixel in the main data array. In the example the BADBITS component has value 1. QUALITY normally works by taking the bit-wise AND of BADBITS with each element of the QUALITY array. Thus, an odd value in the QUALITY array indicates a bad value, while an even value identifies a good pixel.
VARIANCE
— the variance array is the same shape and size as the main data array and contains the errors associated with the individual data values. These are stored as variance estimates for each pixel.
AXIS
— this structure may contain axis information for any dimension of the NDF’s main array. In this case, the main data array is only 1-d, therefore only the AXIS(1) structure is present. This structure contains the actual axis data array, and also label and units information.
HISTORY
— this component provides a record of the processing history of the NDF. Only the first of three records is shown in the example. This indicates that the spectrum was extracted from fibre data using the Figaro FINDSP command on 19th December 1990. (Support for the history component is not yet provided by the NDF access routines.)
EXTENSIONs
— the purpose of extensions is to store non-standard items. EXAMPLE.SDF began life as an old-style (DST) Figaro file1 which contained values for the airmass and exposure time associated with the observations. These are stored in the Figaro extension, and the intention is that the Figaro applications which use these values will know where to find them.

1The file was converted to an NDF using the CONVERT command DST2NDF.