2 SETTING UP FOR FORMAT CONVERSION

 2.1 Name Your Formats
 2.2 Rules and Regulations
 2.3 Defining Conversion Commands
 2.4 Accessing Sub-structures Within Foreign Data Files
 2.5 Writing Format Conversion Utilities
 2.6 Defining Output Formats
 2.7 Specifying an Output Format
 2.8 Propagating Data Formats
 2.9 Resolving Naming Ambiguities
 2.10 Example: Setting Up a New Format

2.1 Name Your Formats

The first step in setting up the NDF library to access foreign data formats is to define a name for each foreign format to be recognised, and to associate a file extension with each of these names. The file extension will be used to determine which format a file is written in.

This is done by defining the environment variable called NDF_FORMATS_IN to contain a format list, such as the following:

  setenv NDF_FORMATS_IN ’FITS(.fit),FIGARO(.dst),IRAF(.imh)’

This is a comma-separated list of format specifications, where each specification consists of a format name (e.g. FITS) with an associated file extension (e.g. ‘.fit’) in parentheses.

This list serves two purposes. First, it defines the set of formats and associated file extensions to be recognised when accessing input2 datasets. This means, for instance, that if a dataset name such as:

run66.fit

were given to the NDF library, it would recognise it as a FITS format file and try to carry out the appropriate conversion.

The list also defines a search order for foreign data formats. This means that if the dataset name supplied had been simply:

run66

then the NDF library would first look for a native format NDF with this name (i.e. in the file run66.sdf). If this was not found, it would then look for a file called run66.fit, then run66.dst and then run66.imh, stopping when the first one was found and associating the appropriate data format with it. If none of the files existed, a “file not found” error would result.

Note that the ability to select sections from pre-existing NDF datasets (see SUN/33) is also available when accessing foreign data files, so that entering:

run66.fit(100.0 50.0)

or

run66(100.0:200.0,10:512)

would result in the same actions as above to locate a suitable file and to convert its format, with the required section then being extracted from the converted NDF and passed to the application.

2.2 Rules and Regulations

You may define up to 50 foreign formats to be recognised in this way, and may give them any names and file extensions you like (apart from the format name NDF and the file extension ‘.sdf’ which are reserved for the native NDF format). Format names are not case sensitive, although file extensions are if that makes sense for the host file system (e.g. they are case sensitive on UNIX). File extensions should always begin with a ‘.’ and appear in parentheses following the associated format name. There is no individual limit on the length of a format name or file extension, but the entire format list is limited to 1024 characters.

Note that the same foreign format name and/or file extension may appear more than once in the NDF_FORMATS_IN list. The first occurrence takes precedence when searching for files. Thus, you could associate different file extensions with the same format name to define synonyms for file extensions.

2.3 Defining Conversion Commands

For each foreign format which appears in the NDF_FORMATS_IN list, you should also provide commands to perform the necessary format conversions to and/or from the native NDF format. These commands are also defined by means of environment variables.

Taking the FITS format (above) as an example, this means defining up to two commands – one for converting from FITS format to NDF format and the other for converting back again, such as the following:

  setenv NDF_FROM_FITS ’fitsin in=^dir^name^type out=^ndf’
  setenv NDF_TO_FITS   ’fitsout in=^ndf out=^dir^name^type’

Here, the names of two environment variables have been formed by prefixing ‘NDF_FROM_’ and ‘NDF_TO_’ to the foreign format name (in upper case) and each of these variables has been set to contain a command which performs the appropriate format conversion (in this case by invoking two conversion utilities called “fitsin” and “fitsout”, which we assume to exist).

Ideally, you would define both of these commands. However, if you only want to support conversion in one direction, then it is quite acceptable to omit either of them. The commands are only accessed when the occasion to use them arises, so no error will result if they are omitted but never used.

When needed, the conversion commands you define will be interpreted (in a separate process) by a command interpreter appropriate to the host operating system.3 The commands are actually invoked by passing them to the C run time library “system” function, and they may therefore use any components of the environment which are inherited through that interface. Typically this means that such things as the default directory and environment variables are available to these commands.

Before the commands are invoked, the NDF library will perform token substitution on them, in order to insert the names of the actual datasets to be processed. The tokens used to represent these datasets are, in fact, message tokens – identical to those used by the MERS and EMS libraries (SUN/104 and SSN/4) and commonly used when reporting errors and other messages from within applications. They are used in conversion commands in exactly the same way (they appear in the example commands above prefixed with the ‘^’ substitution character), and the NDF library defines a set of them for this purpose, as follows:



Token Value




dir Directory in which the foreign file resides
name Foreign file name (without directory or extension)
type Foreign file extension (with leading ‘.’)
vers Foreign file version number (blank if not supported)
fxs Foreign extension specifier (see §2.4 )
fxscl Clean version of fxs (all non-alphanumeric characters replaced by underscores)
fmt Foreign format name (upper case)
ndf Full name of the native NDF format copy of the dataset


Note that the EMS library, which performs substitution of these tokens, imposes a limit of 200 characters on the resulting command. If long file names are in use this may present a problem unless the conversion command itself is short. Fortunately, this can always be arranged by wrapping it up in a simple script if necessary.

2.4 Accessing Sub-structures Within Foreign Data Files

The native HDS format allows multiple NDFs to be stored within a single disk file, and some foreign data formats provide somewhat similar facilities. As a concrete example, the FITS format allows images to be stored within image extensions, so a single FITS file may contain several images, each of which can be thought of as a foreign format NDF. When an NDF application is run, a specific NDF within such a FITS file can be selected by appending a foreign extension specifier (FXS) to the end of the file name. A foreign extension specifier consists of a string delimited by matching square brackets. The string identifies a sub-structure within the specified file, using syntax specific to the data format. So, for instance, the second image extension within a FITS file called m51.fit could be specified using the string “m51.fit[2]”. Here, the sub-string “[2]” forms the foreign extension specifier, and uses the syntax expected by the CONVERT application FITS2NDF.

The foreign extension specifier is made available to external commands using a message token called fxs. Since this will certainly include square brackets (and possibly other non-alphanumeric characters), it cannot safely be included directly within the name of a file. You may want to do this for instance, when setting up the NDF_KEEP_ or NDF_TEMP_ environment variables. For this reason, a “cleaned” version of the foreign extension specifier is also available, in a message token called fxscl. This is equal to fxs except that all non-alphanumeric characters are replaced by underscores.

Note, currently the NDF library only allows foreign extension specifiers to be given when accessing existing NDFs for read-only access. An error will be reported if an FXS is included in the name of an NDF to be created, or an existing NDF for which update or write access is required.

2.5 Writing Format Conversion Utilities

In the previous section, the utilities “fitsin” and “fitsout” were presumed to exist to perform the necessary conversions. For commonly encountered formats, this is likely to be the case, and the CONVERT package (SUN/55) and other likely sources of conversion utilities should be investigated before embarking on writing your own. Don’t forget that you can often adapt existing utilities (including those provided by the operating system) by combining them into a suitable script.

If you do need to write your own format conversion utilities from scratch, then the rules that apply are very few. It should obviously be possible to execute your utility by invoking a suitable command which includes the names of the input and output datasets. Your utility will also need to be able to interpret the NDF name it receives. This means that if you are writing a program, it should probably use the NDF library to access the NDF data (rather than, say, HDS, which cannot necessarily interpret the compound data structure names that will occur).4 For a template example of a conversion utility that reads data from unformatted Fortran files, see SUN/33.

As far as possible, the NDF library will attempt to ensure that the output dataset to be written by a conversion command does not already exist, by deleting it first if necessary (your conversion utility should then create it). However, it may not always be wise to depend on this. In particular, recovery from error conditions (such as failed conversions) is likely to be more robust if conversion commands are able to cope when their output datasets already exist.

Unless you are debugging, you should also arrange for conversion utilities not to write to the standard output channel, as such output will otherwise appear whenever a conversion occurs. This is not normally wanted.

Beyond this, you have complete freedom to define and implement the conversion you want to perform. This may have whatever side effects you choose, so long as it results in the production of the requested output dataset, leaves its input dataset intact and returns an appropriate status value to the NDF library (see §3.6 for a discussion of error handling in conversion commands).

2.6 Defining Output Formats

As you might expect, you define the formats for output5 datasets in rather the same way as for input datasets (§2.1), by means of a search list. However, the way this list is used is slightly different in this case.

The output format list is found by translating the environment variable NDF_FORMATS_OUT, which might typically have a definition such as:

  setenv NDF_FORMATS_OUT ’.,FITS(.fit),FIGARO(.dst),IRAF(.imh)’

Ignoring, for the moment, the ‘.’ at the start, this list defines the names of foreign data formats which are to be recognised when creating new datasets, and associates a file extension with each one. The syntax and restrictions are identical to the NDF_FORMATS_IN list (see §2.2).

There is no requirement for the output formats to be the same as those used for input although, for obvious reasons, they will often be so. You could, however, give your formats different names or file extensions in the output list if you wanted.

The NDF library uses the same commands to perform format conversion for output datasets as for input ones (see §2.3), so the names of output formats should be chosen to select the environment variable containing the appropriate command. Note, however, that the “NDF_FROM_…” command will not be used in the case of output datasets.

2.7 Specifying an Output Format

With the output format list above, the following could be given to the NDF library when it is expecting the name of a new dataset:

newfile.fit

and it would recognise this as a request to write the new file in FITS format (performing the appropriate conversion when necessary).

If the name supplied were simply:

newfile

(i.e. if no file extension is specified), then the first format appearing in the output format list would be used. This is where the ‘.’ in the earlier example (§2.6) comes in, as it stands for the native NDF format. Hence, a native format NDF would be written in this case. This is normally the required behaviour, so having ‘.’ at the start of the format list is recommended.

However, if you wanted to work predominantly with a foreign format (say you were using NDF applications with another package which could not access NDF data directly), then you could put that format at the start of the output format list. For example:

  setenv NDF_FORMATS_OUT ’IRAF(.imh),FITS(.fit)’

would cause all output files to be written in IRAF format and to have a file extension of ‘.imh’ by default. You could still specify FITS format explicitly by giving a file extension of ‘.fit’.

2.8 Propagating Data Formats

The output format may also be determined according to the format of a related input dataset. This is achieved by putting the “wild-card” character ‘’ at the start of the output format list, as follows:

  setenv NDF_FORMATS_OUT ’*,.,FIGARO(.dst),FITS(.fit),IRAF(.imh)

This affects new datasets which are created as a result of the propagation of information from an existing dataset (e.g. by applications calling the routines NDF_PROP or NDF_SCOPY, as described in SUN/33) and for which no explicit output file extension is given. In this instance, not only will data values be propagated, but so also will the dataset format. Thus, if a FIGARO format dataset (with file extension ‘.dst’) had been accessed for input and propagated to create an output dataset, then a similar FIGARO format dataset would be created (also with a ‘.dst’ file extension) unless an explicit output file extension were given.6

Note that output datasets which are created by applications without propagation from an existing dataset do not inherit any format information. In this case, any ‘’ in the format list is ignored and the normal rules apply (in the example above, a native format NDF dataset would be created instead).

2.9 Resolving Naming Ambiguities

Unfortunately, because the ‘.’ (dot) character is used both to separate a file extension from its file name and also to separate fields in an NDF (or HDS) object name, ambiguities can sometimes arise. For example, if the dataset name:

datafile.fit

is supplied, it might mean a foreign (FITS) data file with a ‘.fit’ extension, or it might identify an NDF structure called FIT residing within an HDS file called datafile.sdf.

In such cases, the NDF library always uses the former interpretation. That is, it attempts to access (or create) a foreign format file whenever a file extension appears to be present and corresponds with a known foreign data format. For example, if ‘.xyz’ is a recognised foreign file extension, then:

myfile.xyz

and

my.file.xyz

are both references to foreign format files rather than HDS objects (although they may not necessarily be valid file names on all operating systems). Conversely:

yourfile.abc

Would not identify a foreign file if ‘.abc’ is not a recognised foreign file extension.

On UNIX, where the file system is case sensitive, it is possible to circumvent this behaviour by exploiting the case insensitivity of HDS component names. For instance, if ‘.img’ (lower case) is a recognised foreign file extension, then the dataset name:

anyfile.IMG

with ‘.IMG’ in upper case, refers to a native format NDF (an object called IMG contained within the HDS file anyfile.sdf). To leave this possibility open, it is recommended that foreign file extensions should always contain at least one lower case character.

2.10 Example: Setting Up a New Format

The following example shows the C shell commands that might be used on a UNIX system to give NDF-based applications access to a new data format. Typically, commands such as these would appear in a startup file, perhaps packaged as part of a “driver” that could be installed to give access to the data format in question:

  #  Ensure that the new format and its file extension are recognised on
  #  input.
        if ($?NDF_FORMATS_IN) then
           setenv NDF_FORMATS_IN $NDF_FORMATS_IN’,NEW(.new)’
        else
           setenv NDF_FORMATS_IN ’NEW(.new)’
        endif
  
  #  Similarly, ensure they are recognised on output.
        if ($?NDF_FORMATS_OUT) then
           setenv NDF_FORMATS_OUT $NDF_FORMATS_OUT’,NEW(.new)’
        else
           setenv NDF_FORMATS_OUT ’.,NEW(.new)’
        endif
  
  #  Define commands to convert from the new format to NDF format and
  #  vice versa.
        setenv NDF_FROM_NEW ’new2ndf in=’\’^dir^name^type\’’ out=’\’^ndf\’
        setenv NDF_TO_NEW   ’ndf2new in=’\’^ndf\’’ out=’\’^dir^name^type\’

This example illustrates a couple of points which were not addressed earlier:

(1)
We first check to see if the NDF_FORMATS_IN and NDF_FORMATS_OUT environment variables are already defined. If they are, we can append our new format description to them so as not to disturb any definitions already in use. Otherwise we must set them up from scratch.
(2)
The environment variable definitions have been written so that single quote characters appear around the names of datasets. For example, the translation of the environment variable NDF_FROM_NEW would be:
  new2ndf in=’^dir^name^type’ out=’^ndf’

Although the syntax needed is a bit messy, this does mean that any special characters that appear in dataset names will be handled correctly (i.e. literally), and not expanded by the shell that interprets the command.

2Strictly speaking, NDF_FORMATS_IN defines the formats recognised when accessing pre-existing datasets. Although it is possible to update and write to such datasets, it is nevertheless convenient to refer to them as “input” datasets.

3On UNIX, this will be the “sh” (Bourne) shell.

4But also see §3.3 for ways of avoiding this restriction.

5As before, we really mean new datasets here (because you could write output to a pre-existing dataset, which is covered by the NDF_FORMATS_IN list), but thinking of them as “output” datasets is more convenient.

6Note that in this case the input format description is being used to create the output dataset, so it would not strictly be necessary for the FIGARO format to appear in the NDF_FORMATS_OUT list.