Starlink standard data formats support two methods of handling bad data: magic value (which flags specified pixels as undefined) and data quality (a more general mechanism, which may be used to indicate any attribute of selected pixels, including “badness”). Magic value is simple and efficient. Data quality is flexible and preserves the original data.
To flag a data value as “bad”, an associated data-quality value can be used. This is an array of 8-bit positive integers, one per element of the data array with which it is associated (a single value, applying to all elements of the data array, is also possible, but this will rarely be useful), whose bits describe, in various ways, attributes of the data value concerned. The recommended way to use data quality is to regard the 8 bits as eight independent logical masks, one mask per attribute.
As its name implies, data-quality is a qualitative description of the data value. It is frequently used to flag bad pixels, but is also useful for “good” attributes, e.g. which regions of a picture constitute the sky sample. It is not in any sense an error estimate (though groups of bits might be used to convey some numerical meaning); it finds application in circumstances where an error estimate is not meaningful. Here are some examples of how data quality might be used:
Sometimes a simple true/false mask is not enough. In such cases it is possible to use combinations of bits to indicate both the presence of the condition and to what subclass of that condition the pixel belongs. For example, a group of three data quality bits could be used not only to flag saturation but also to grade the degree of saturation, on a scale of 1–7.
Clearly, not all values stored in the data system will have associated data-quality; that would be unnecessary and quite wasteful of resources. Normally, data-quality values are associated with basic observational or measured data.
The alternative method for handling bad pixels is the so-called magic value method, where a pixel is assigned a special flag when it has an undefined value—it corresponds to a dead element in a CCD chip, for example, or is the result of division by zero. This terminology should not be confused with the HDS “undefined state”, where a data object exists, but has no value(s) assigned to it. In this document “undefined” means “having a magic value”, unless explicitly stated. An undefined pixel will always be bad, unless repaired in some fashion, and so the data-quality technique is not applicable.
The method is efficient on space: it can always be applied without increasing the data-storage requirement because the flag or magic value replaces the unwanted data value. (For applications where it is important to retain pixel values, or where there is a degree of badness, data quality should be used.) The method enables an application to discover whether a given pixel is bad as soon as it is accessed.
Alternative techniques, based on a list of bad pixels, would be less efficient, because the list would have to be searched repeatedly to see whether given pixels are bad. Such methods would be especially inefficient if large areas of pixels were undefined.
Once a bad pixel has been detected, the application can take appropriate action – flagging the corresponding output pixel as bad, or attempting a repair, perhaps via a choice of interpolation methods.
The HDS undefined state must not be used to indicate bad pixels. If an application finds a data-object in this state, it must report an error, so that the malfunctioning application which created the object can be identified and corrected. The error is fatal.
General-purpose applications, like those in the KAPPA package, should support both magic-value and quality arrays. It will usually be best to look only for the magic-value case in the scientific algorithm part of the code, having dealt with any data-quality information in a preliminary pass which converts flagged pixels into magic-value ones.
The groups of true/false logical flags involved in the data quality mechanism are stored as integer values. We picture these integers as having conventional binary encoding, and adopt the convention that 1true. Specific VAX representations and conventions are not followed and are not in any way involved with the discussion.
Quality is an 8-bit value associated with each datum, and is stored as an unsigned byte. A value of zero (i.e. all quality flags set 0false) implies a “ordinary” value which can be accepted at face value by application programs.
In general-purpose applications, the data-quality values are regarded as a set of 8 independent masks, each of which is 1 bit deep. Whether a given pixel is to be included in the processing or not (i.e. whether it “bad”) is determined by comparing its quality value with a bit pattern stored in a _UBYTE data object [BADBITS] within the QUALITY structure. The following logical expression is evaluated:
where is the logical AND operation.
Note that if a [BADBITS] mask is zero (i.e. all false), the corresponding data-quality mask is ignored.
This can be used to turn off all 8 data-quality masks and allow inspection or processing of the
pixels whatever their status. For a single bit, the above expression has the following truth
and the overall logical value of BAD is the OR of the results for all eight bits—just one of which has to be TRUE to make the resulting pixel bad.
An example may clarify this. Assume [BADBITS] is 01001010 (where the bits of the binary number are written with the most significant at the left, and are numbered from the right beginning with zero). For this [BADBITS] value, a pixel with a [QUALITY] value of 10100100 is interpreted as non-bad, because bits 2, 5 and 7, which are set in the data-quality value, are not set in [BADBITS]. However, a [QUALITY] value of 10100110 generates a bad value because bit 1, which is set in the data-quality value, is also set in [BADBITS]. If data object [BADBITS] is not present its value is assumed to be to be 00000000, and general-purpose applications will accept as “good” any pixel, irrespective of the corresponding data-quality value.
The rules and conventions for the processing of data-quality values and their associated data, taking into account the possible presence of undefined values, are as follows.
However, in some cases it is hard to identify a principal data array, or the principal data array does not have quality and one or more of the others does. Therefore, what is best depends on the nature of the application. For example, in the computation of the statistics of corresponding pixels from each of a series of pictures, to produce mean or standard-deviation arrays, it is vital to exclude all bad values from the calculations. A related problem is what the quality of output data arrays should be, and here again programmers must make case-by-case judgements.
If a [QUALITY] array is present it is assumed that it is to be used to define bad pixels unless:
If [QUALITY] is not present the magic-value method is assumed.
There is no one ideal way of handling data quality in general-purpose routines. Methods will evolve as experience with real applications and data is gained. The main considerations are:
Applications can be as sophisticated and specialised as they like in their use of data quality, and are at liberty to assign specific meanings to values of data quality, e.g. a fiducial mark, vignetting, saturation. The details of how data-quality information is encoded within the 8 bits are specific to each kind of data source and specialist package. A description of how quality will be interpreted must be given in the documentation for each package that uses the technique. However, it is possible to identify some general features of data-quality processing.
Each data-quality value can be regarded as a set of bit groups, each containing one or more bits. The recommended approach is to use single bits, each with an independent meaning, to form eight 1-bit deep logical masks. However, it is also permissible to take several bits (which ought to be contiguous) and interpret them as a positive integer. Single bit fields are used to contain a flag (1 = .TRUE., 0 = .FALSE.) for some feature (e.g. “pixel in fiducial”). Multiple-bit fields are used to contain code numbers or degree of quality.
It is envisaged that most manipulation of data-quality values will be done quite transparently by those applications which know how to use them to advantage, without the user being aware of the mechanism. However, it is expected that there will be some cases where users will want to manipulate data quality explicitly, and there will be various data-quality editing applications, often using graphics or image displays. For example, there will be instances where the user wishes to view a picture on a display and select which pixels are to be temporarily flagged as “wrong”, rather than trust some automatic algorithm.
Since the data quality codes are stored separately from the actual data, data-quality editing will normally be a reversible process, leaving the data values themselves untouched.
(n.b. The implementation of data quality is largely unchanged from the Wright-Giddings proposal.)
For each of HDS’s primitive TYPEs, the magic-value method uses the values given in Table 10. Alert readers will note that these are the same as the bad values used by HDS.
|Data TYPE||Value||Hexadecimal pattern|
Use of “undefined data” flags must be restricted to three operations: (i) setting a datum to “undefined”, (ii) testing whether a datum is in the undefined state, and (iii) replacing an undefined datum with a valid value (using an assignment statement). Arithmetic operations on undefined data values are banned. Magic values are applicable to both scalar and vector data objects. There are some exceptions and these are individually noted.
For efficiency, pixel values are tested inline for equality with the magic value of the appropriate type.
However, the numerical values given above must not be written explicitly in the code; instead,
<T> is the one or two-letter type code (e.g. see SUN/7), should be
used. These variables are specified via an INCLUDE file with logical name BAD_PAR. Here is a
trivial example, which computes the mean of a one-dimensional REAL array. (n.b. Actual
applications would include comments and defences against rounding errors, excluded for brevity
Note that only valid pixels are counted and summed.
For reasons of efficiency of processor time and work space, and to permit easier portability and
adaptation of general-purpose subroutine libraries, a flag called [BAD_PIXEL] may be provided
within a structure to denote whether undefined pixels are present. Only if it is present and set to
.FALSE. will it be permissible to bypass magic-value testing. Thus, many packages will support
two sets of algorithmic subroutines; one which tests magic values, and one which does