7.1 Quality

Starlink standard data formats support two methods of handling bad data: magic value (which flags specified pixels as undefined) and data quality (a more general mechanism, which may be used to indicate any attribute of selected pixels, including “badness”). Magic value is simple and efficient. Data quality is flexible and preserves the original data.

#### 7.1 Quality

To flag a data value as “bad”, an associated data-quality value can be used. This is an array of 8-bit positive integers, one per element of the data array with which it is associated (a single value, applying to all elements of the data array, is also possible, but this will rarely be useful), whose bits describe, in various ways, attributes of the data value concerned. The recommended way to use data quality is to regard the 8 bits as eight independent logical masks, one mask per attribute.

As its name implies, data-quality is a qualitative description of the data value. It is frequently used to flag bad pixels, but is also useful for “good” attributes, e.g. which regions of a picture constitute the sky sample. It is not in any sense an error estimate (though groups of bits might be used to convey some numerical meaning); it finds application in circumstances where an error estimate is not meaningful. Here are some examples of how data quality might be used:

• In an image where the intensities of some pixels have been digitally truncated, there is only a lower limit to the actual incident intensity; the upper limit is unbounded. Data-quality could be used to flag this condition, and application programs could then decide whether to use the pixel value or to treat it as missing.
• Data quality is useful where a pixel has an accurate intensity, but has to be interpreted in a different way from other pixels. The case of pixels affected by fiducial marks (e.g. reseaux) is a common example of this. For most parts of the processing, such pixels must be excluded. However, in an application which locates the fiducial marks themselves, they would clearly be crucial.
• Where parts of a picture are vignetted, data-quality allows these regions to be ignored when appropriate (at the discretion of the user, for example) without losing what information they contain.

Sometimes a simple true/false mask is not enough. In such cases it is possible to use combinations of bits to indicate both the presence of the condition and to what subclass of that condition the pixel belongs. For example, a group of three data quality bits could be used not only to flag saturation but also to grade the degree of saturation, on a scale of 1–7.

Clearly, not all values stored in the data system will have associated data-quality; that would be unnecessary and quite wasteful of resources. Normally, data-quality values are associated with basic observational or measured data.

#### 7.2 Magic or Undefined Value

The alternative method for handling bad pixels is the so-called magic value method, where a pixel is assigned a special flag when it has an undefined value—it corresponds to a dead element in a CCD chip, for example, or is the result of division by zero. This terminology should not be confused with the HDS “undefined state”, where a data object exists, but has no value(s) assigned to it. In this document “undefined” means “having a magic value”, unless explicitly stated. An undefined pixel will always be bad, unless repaired in some fashion, and so the data-quality technique is not applicable.

The method is efficient on space: it can always be applied without increasing the data-storage requirement because the flag or magic value replaces the unwanted data value. (For applications where it is important to retain pixel values, or where there is a degree of badness, data quality should be used.) The method enables an application to discover whether a given pixel is bad as soon as it is accessed.

Alternative techniques, based on a list of bad pixels, would be less efficient, because the list would have to be searched repeatedly to see whether given pixels are bad. Such methods would be especially inefficient if large areas of pixels were undefined.

Once a bad pixel has been detected, the application can take appropriate action – flagging the corresponding output pixel as bad, or attempting a repair, perhaps via a choice of interpolation methods.

The HDS undefined state must not be used to indicate bad pixels. If an application finds a data-object in this state, it must report an error, so that the malfunctioning application which created the object can be identified and corrected. The error is fatal.

#### 7.3 Implementation

General-purpose applications, like those in the KAPPA package, should support both magic-value and quality arrays. It will usually be best to look only for the magic-value case in the scientific algorithm part of the code, having dealt with any data-quality information in a preliminary pass which converts flagged pixels into magic-value ones.

The groups of true/false logical flags involved in the data quality mechanism are stored as integer values. We picture these integers as having conventional binary encoding, and adopt the convention that 1$\equiv$true. Specific VAX representations and conventions are not followed and are not in any way involved with the discussion.

##### 7.3.1 Data Quality

Quality is an 8-bit value associated with each datum, and is stored as an unsigned byte. A value of zero (i.e. all quality flags set 0$\equiv$false) implies a “ordinary” value which can be accepted at face value by application programs.

#### General-purpose applications

In general-purpose applications, the data-quality values are regarded as a set of 8 independent masks, each of which is 1 bit deep. Whether a given pixel is to be included in the processing or not (i.e. whether it “bad”) is determined by comparing its quality value with a bit pattern stored in a $<$_UBYTE$>$ data object [BADBITS] within the $<$QUALITY$>$ structure. The following logical expression is evaluated: $BAD=QUALITY$^BADBITS

where ^ is the logical AND operation.

Note that if a [BADBITS] mask is zero (i.e. all false), the corresponding data-quality mask is ignored. This can be used to turn off all 8 data-quality masks and allow inspection or processing of the pixels whatever their status. For a single bit, the above expression has the following truth table:

and the overall logical value of BAD is the OR of the results for all eight bits—just one of which has to be TRUE to make the resulting pixel bad.

An example may clarify this. Assume [BADBITS] is 01001010 (where the bits of the binary number are written with the most significant at the left, and are numbered from the right beginning with zero). For this [BADBITS] value, a pixel with a [QUALITY] value of 10100100 is interpreted as non-bad, because bits 2, 5 and 7, which are set in the data-quality value, are not set in [BADBITS]. However, a [QUALITY] value of 10100110 generates a bad value because bit 1, which is set in the data-quality value, is also set in [BADBITS]. If data object [BADBITS] is not present its value is assumed to be to be 00000000, and general-purpose applications will accept as “good” any pixel, irrespective of the corresponding data-quality value.

The rules and conventions for the processing of data-quality values and their associated data, taking into account the possible presence of undefined values, are as follows.

Rules
• Undefined pixels stay bad after processing.
• Undefined pixels generated during the processing (other than through data quality), e.g. logarithm of a negative data value, are propagated to the output data value.
• If processing would or might have changed the value of a pixel, had the pixel not been marked as bad through data quality, then it must propagate an undefined pixel. The input quality is propagated. In applications where data value will not have been changed as a result of the processing, the application is permitted either (1) to propagate the original data value and quality or (2) to propagate an undefined pixel.
Conventions
• When there is more than one input data array (cf. Section 9), the input or original data quality is deemed to be that associated with the principal data array.

However, in some cases it is hard to identify a principal data array, or the principal data array does not have quality and one or more of the others does. Therefore, what is best depends on the nature of the application. For example, in the computation of the statistics of corresponding pixels from each of a series of pictures, to produce mean or standard-deviation arrays, it is vital to exclude all bad values from the calculations. A related problem is what the quality of output data arrays should be, and here again programmers must make case-by-case judgements.

• If an undefined pixel is generated during the processing, the data quality of the output data value is nonetheless the same as the input data quality, just as if a good pixel had been generated. (Although a pixel is undefined, you may still need to know, for example, that the pixel was a part of fiducial mark.)
• The original data and quality values remain unchanged.

If a [QUALITY] array is present it is assumed that it is to be used to define bad pixels unless:

• there is a parameter which overrides the default;
• the bit pattern of [BADBITS] is 00000000, or
• [BADBITS] is omitted from the $<$QUALITY$>$ structure.

If [QUALITY] is not present the magic-value method is assumed.

There is no one ideal way of handling data quality in general-purpose routines. Methods will evolve as experience with real applications and data is gained. The main considerations are:

• Disc space and virtual memory—use of full-size work arrays may be unacceptable for large data frames.
• Speed—checks for magic value and data quality in pixel-by-pixel processing loops should be kept to a minimum. For example, the program could first determine whether bad pixels and data quality are present or not, and then call different processing routines for the two cases.

#### Specialist applications

Applications can be as sophisticated and specialised as they like in their use of data quality, and are at liberty to assign specific meanings to values of data quality, e.g. a fiducial mark, vignetting, saturation. The details of how data-quality information is encoded within the 8 bits are specific to each kind of data source and specialist package. A description of how quality will be interpreted must be given in the documentation for each package that uses the technique. However, it is possible to identify some general features of data-quality processing.

Each data-quality value can be regarded as a set of bit groups, each containing one or more bits. The recommended approach is to use single bits, each with an independent meaning, to form eight 1-bit deep logical masks. However, it is also permissible to take several bits (which ought to be contiguous) and interpret them as a positive integer. Single bit fields are used to contain a flag (1 = .TRUE., 0 = .FALSE.) for some feature (e.g. “pixel in fiducial”). Multiple-bit fields are used to contain code numbers or degree of quality.

It is envisaged that most manipulation of data-quality values will be done quite transparently by those applications which know how to use them to advantage, without the user being aware of the mechanism. However, it is expected that there will be some cases where users will want to manipulate data quality explicitly, and there will be various data-quality editing applications, often using graphics or image displays. For example, there will be instances where the user wishes to view a picture on a display and select which pixels are to be temporarily flagged as “wrong”, rather than trust some automatic algorithm.

Since the data quality codes are stored separately from the actual data, data-quality editing will normally be a reversible process, leaving the data values themselves untouched.

(n.b. The implementation of data quality is largely unchanged from the Wright-Giddings proposal.)

##### 7.3.2 Magic Values

For each of HDS’s primitive TYPEs, the magic-value method uses the values given in Table 10. Alert readers will note that these are the same as the bad values used by HDS.

Table 10: Magic values for bad pixels
 Data TYPE Value Hexadecimal pattern $<$_BYTE$>$ $-$128 80 $<$_UBYTE$>$ 255 FF $<$_WORD$>$ $-$32768 8000 $<$_UWORD$>$ 65535 FFFF $<$_INTEGER$>$ $-$2147483648 80000000 $<$_REAL$>$ $-$1.7014117E+38 FFFFFFFF $<$_DOUBLE$>$ $-$1.701411834604923D+38 FFFFFFFFFFFFFFFF

Use of “undefined data” flags must be restricted to three operations: (i) setting a datum to “undefined”, (ii) testing whether a datum is in the undefined state, and (iii) replacing an undefined datum with a valid value (using an assignment statement). Arithmetic operations on undefined data values are banned. Magic values are applicable to both scalar and vector data objects. There are some exceptions and these are individually noted.

For efficiency, pixel values are tested inline for equality with the magic value of the appropriate type. However, the numerical values given above must not be written explicitly in the code; instead, variables called VAL__BAD<T>, where <T> is the one or two-letter type code (e.g. see SUN/7), should be used. These variables are specified via an INCLUDE file with logical name BAD_PAR. Here is a trivial example, which computes the mean of a one-dimensional REAL array. (n.b. Actual applications would include comments and defences against rounding errors, excluded for brevity here.)

INTEGER I, N, NPIX
PARAMETER (NPIX = 100)

REAL DATA(NPIX), SUM, MEAN

SUM = 0.0
N = 0
DO I = 1, NPIX
IF ( DATA( I ) .NE. VAL__BADR ) THEN
SUM = SUM + DATA( I )
N = N + 1
END IF
END DO

IF ( N .EQ. 0 ) THEN
ELSE
MEAN = SUM/ REAL( N )
END

Note that only valid pixels are counted and summed.

For reasons of efficiency of processor time and work space, and to permit easier portability and adaptation of general-purpose subroutine libraries, a flag called [BAD_PIXEL] may be provided within a structure to denote whether undefined pixels are present. Only if it is present and set to .FALSE. will it be permissible to bypass magic-value testing. Thus, many packages will support two sets of algorithmic subroutines; one which tests magic values, and one which does not.