### 16 Handling Large Images

This recipe gives some hints about reducing large images with CCDPACK (see SUN/139[10]). Starlink data reduction applications, such as CCDPACK, do not on the whole have formal limits on image size. However, reducing very large sets of data can make heavy demands on system resources, which can lead to long run times, degradation of the performance (especially interactive response time) of the machine being used, failure of the applications, or in extreme cases system crashes. Even if you are of a patient disposition, these effects could make you unpopular with other users, so it is worth giving some additional thought to this sort of work.

#### 16.1 How large is large?

What is and is not a problem large enough to require special care will depend on what is being done and on the computer being used. As a very rough indication, images smaller than 1000x1000 in most cases do not count as large, and ones larger than 5000$×$5000 in most cases (at the time of writing) do; for cases in-between it depends very much on the details.

The ‘size’ of a data reduction problem is some ill-defined function of, inter alia:

• number of pixels per frame,
• number of objects,
• number of frames: the number of bias and flat field frames to be processed will be important as well as the number of target object frames,
• overlap of frames: some parts of the reduction process which compare objects or backgrounds between frames will perform differently according to how much overlap in coverage there is between frames.

The principal resources which can fall into short supply during a data reduction process are as follows.

Memory:
a computer has a fixed amount of real memory (RAM; Random Access Memory), and also a part of the disk called swap space which serves as an overflow if running processes need more memory than the available RAM. If there is insufficient real memory + swap space to run the program, it will fail. If there is insufficient real memory for the parts of the program and data which are used simultaneously to be loaded at once, a lot of time will be spent shifting data between RAM and disk, and the program (as well as other processes on the same machine) will run painfully slowly. Depending on the operating system and the way the machine is set up, either of these eventualities can lead to termination of other processes on the machine, or system crashes.
Disk space:
if there is insufficient disk space the program will fail. If other processes are writing to the same disk partition they can fail too.
Input/Output:
Input/Output (I/O) time, that is the time spent waiting for data to be read from and written to disk, will inevitably increase with large data sets. I/O speed is likely to be fairly similar between different low- or mid-range workstations and servers, except in the case where a resource is being used heavily by other processes at the same time; on a busy server this may be the norm.
CPU time:
algorithms which are efficient with CPU (Central Processor Unit) time for small problems may become inefficient for large ones. Speed of execution varies quite a lot between different machines. Some guide is given by the nominal processor speed (in MHz or megaflops), but when processing large data sets on a modern workstation or server, the CPU time spent will normally be limited by memory bandwidth. Bandwidth is not usually quoted as prominently as processor speed, but is typically better on heavy duty servers than on smaller workstations.

Normally the statistic which will actually concern you is elapsed, or ‘wall clock’ time, that is the number of minutes or hours between starting a job off, and the results being available. For a large data reduction job most of this time will typically be spent in I/O, which may or may not include moving data between real memory and swap space. In a multi-user environment however it is important to consider how your use of the machine is affecting the elapsed times of other people’s jobs, or other jobs of your own. As a general rule then, if your data reduction runs fast enough that it does not inconvenience you or other people then you do not have a ‘large’ problem. Otherwise, the rest of this recipe may provide some useful tips.

#### 16.2 Limitation on NDF or HDS file sizes

The Starlink NDF format is a special case of the HDS (Hierarchical Data System; see SUN/92[32]) format. There is currently a fundamental limitation of HDS which will be corrected in a future release. Until then, there is a problem with HDS files longer than 512 Mbyte. Such files can result either from a user NDF file which is very long (for example, a 9000$×$9000 type _REAL frame with variances) or, more likely, from a file used as temporary workspace by CCDPACK or other applications.

This problem may not be reported as such by the software, but often manifests itself as an ‘Object not found’ error, which will cause the application to terminate. In this case there is not much which can be done apart from discussing the matter with the programmer responsible for supporting the package.

#### 16.3 General tips

A full discussion of maximising performance for large jobs is beyond the scope of this document, but the following are good common-sense rules of thumb.

Run on a large machine:
usually the more memory available the faster the job will run, since this reduces the amount of disk I/O needed.
Use local disks:
disks attached to the machine running the job will be much faster than disks attached to another machine which are accessed remotely via the local network. It can also make a big difference to use a disk which no other process is making heavy use of at the time. Your system manager may be able to advise on choice of disk.
Be economical with disk space:
while it may make sense to retain all intermediate files (for example, de-biassed, flat fielded, re-sampled frames) for small images, these can take up excessive disk space for large images. Scripts can be written to remove files as they go along, or appropriate options of the applications can be used (for example, keepin=false in CCDPACK’s debias and flatcor). When thinking about disk space requirements, remember that large temporary files can be created by some of the applications. These files have names like t123.sdf and are created in the directory pointed to by the environment variable HDS_SCRATCH, or in the current directory if HDS_SCRATCH is not defined.
Discuss with your system manager and/or other users:
if your job could have a serious effect on the system’s performance it might be polite to ask if there are recommended ways of going about it, or to warn other users.
Run at off-peak times:
if you can run your job at a time when few or no other processes are running on the machine in question it will run faster and inconvenience other users less. The Unix at command can be used to start a job at a given time, or there may be other queuing software installed at your site.
Be nice to other processes:
on Unix the commands nice or renice should be used when running CPU-intensive jobs on multi-user machines. In the C shell typing:
% nice +18 reduce_script

would run the script reduce_script at a ‘niceness’ of 18. This setting means that the job will be less aggressive in requesting CPU time, thus making it run slower, but causing less disruption to other processes (presumably ones with more moderate requirements). The higher the niceness, the less demanding the job is, with 18 often a sensible maximum. Ask your system manager for more details; there may be locally recommended values for certain kinds of job. Note however that the only resource usage this affects is CPU time, so that even a maximally niced job can cause major disruption.

Keep an eye on the job:
if your job might push the system to its limits, especially if you have not run one of similar size before, it is a good idea to monitor its progress, for instance to check that the system’s swap space or file system is not filling up (using, for example, top and df respectively).

#### 16.4 Bottleneck applications in CCDPACK

Some parts of the data reduction process are much more expensive than others, and these are not always the same for large images as for small ones.

The maximum frame size which can be treated is determined mainly by the memory required. Exactly how this limitation manifests itself is quite dependent on the system, but if the size of the process is much bigger than available real memory it is likely to run very slowly. There may also be local guidelines about the largest processes which may be run on given machines. Table 1 gives a guide to how memory use of the most demanding CCDPACK applications scales with frame size.

 Variance No variance Mask No mask Mask No mask debias (with bias frame) 8.25 6.0 7.25 5.0 flatcor 5.5 5.5 2.75 2.75 makeflat 4.5 4.5 2.75 2.75 makebias 3.0 3.0 3.0 3.0 tranndf 2.75 2.75 2.75 2.75 ardmask (KAPPA) 4.0 4.0

Table 1: Words required per pixel for the largest CCDPACK applications. The memory usage of most CCDPACK applications scales with the number of pixels per image. This table gives the number of (4 byte) words required per pixel when the calculations are being done at _REAL (4 byte) precision. So, for example, debiassing a 4000$×$4000 frame with variances and a bad pixel mask at _REAL precision requires around $4000×4000×8.25×4$ bytes $\approx$ 500 Mbyte. The values in this table are meant only as a rough indication; for some of the applications memory use is a more complex function of the details of the task than suggested here

Briefly, the heaviest users of resources are:

memory:
debias; then makebias, flatcor, makeflat and tranndf,
CPU time:
makemos (normalisation); then tranndf. Sometimes findoff,
I/O:
debias; then makeflat, makemos (normalisation).

Elapsed time for a data reduction sequence will usually be dominated by debias or the normalisation part of makemos, or under some circumstances findoff. More detail is given for some of these in the next section.

#### 16.5 Specific tips

The following tricks are applicable when using several of the Starlink applications. To use some of the commands in the examples you will need to start KAPPA (see SUN/95[6]) by typing kappa at the C shell prompt. These commands (erase, ndftrace, parget, settype, ndfcopy, paste, ardmask, compave, compadd and compick) are described fully in SUN/95; but by way of a quick explanation, the ndftrace, parget pair tells you one thing about the NDF being queried.

Omit variances:
variance information doubles the size of NDFs on disk and for many of the applications substantially increases the CPU and memory usage. Often it is not required, or if it is can be satisfactorily estimated from the data themselves. If you do not need it, then do not generate and/or propagate it. To omit the variance set parameter genvar=false either in ccdsetup or in debias and makebias. If you wish to remove the VARIANCE component from a frame which already contains it, you can use the KAPPA command erase:

% ndftrace frame quiet
% parget variance ndftrace
TRUE
% erase frame.variance
OK - The HDS object FRAME.VARIANCE is to be erased. OK ? /NO/ > yes
% ndftrace frame quiet
% parget variance ndftrace
FALSE

Use an appropriate data type:
possible data types for storage of the pixel values in NDFs are:
 Data Type Size (bytes) _BYTE, _UBYTE 1 _WORD, _UWORD 2 _INTEGER, _REAL 4 _DOUBLE 8

The type _WORD is usually sufficient for storage of most of the intermediate NDFs required in a data reduction sequence (the exception is makeflat which always generates a master flat field of type _REAL or _DOUBLE). If your data type is _INTEGER, _REAL or _DOUBLE therefore it can be worth reducing it to one of the smaller types. The KAPPA programs ndftrace and settype can be used to determine and modify respectively the type of data in an NDF, as in this example:

% ndftrace frame quiet
% parget type ndftrace
_REAL
% settype frame _WORD
% ndftrace frame quiet
% parget type ndftrace
_WORD

Reducing the size of the data type may increase or reduce the CPU time requirements of the program, but should reduce the memory and I/O requirements. Under certain circumstances using a two-byte type can lead to overflow errors however, so some caution should be exercised.

Compact NDFs:
sometimes applications generate or modify NDFs to contain additional empty space; you can tell if this is the case by examining the file using ndftrace to work out the approximate size it should be and comparing this with the size shown by ls. If disk space is very tight, and you do not want to delete files, such oversized NDFs can be compacted using the KAPPA application ndfcopy. For example, using the file of reduced data type created above:

% ls -s frame.sdf
2047 frame.sdf
% ndfcopy frame compacted_frame
% ls -s compacted_frame.sdf
1027 compacted_frame.sdf
% mv compacted_frame.sdf frame.sdf

Treat images in sections:
when faced with really large images, the only way to process them may be by breaking them up into sections. This can be done using NDF sections as described in SUN/95[6], for example:

% flatcor in=huge"(:,:4000)" flat=master_flat"(:,:4000)" out=bottom
% flatcor in=huge"(:,4001:)" flat=master_flat"(:,4001:)" out=top
% paste bottom top out=huge_flatcor

Reduce image resolution:
in the event that image resolution is better than required, the size of the frames can be reduced by using one of the KAPPA applications compave, compadd or compick. If the averaged pixels are still small enough to under-sample the image point spread function this approach will be ok; otherwise it is rather a waste of good data, but may be useful for taking a quick look at oversized frames.

Finally, we list the most demanding of the CCDPACK applications with some notes about each one.

debias:
debias is the heaviest user of memory and so is where problems are most likely to arise. The following suggestions are possible ways of limiting the resources used:
if the mask parameter is set (to the name of an image or ARD file) then more memory is required. It can therefore be more efficient to apply the mask explicitly elsewhere in the reduction sequence, for example, to the bias frame prior to de-biassing:

% debias in="data?" out="debias_*" bias=masked_bias \

% debias in="data?" out="debias_*" bias=master_bias \

Variances:
using variances will also have a big impact on the requirements of debias, and so should be avoided (using genvar=false) if possible.
Bias frames:
de-biassing using the bias strips rather than a master bias frame (see Sections 5.2 and 13.1) reduces the work done by debias and also makes it unnecessary to process the bias frames at all. This technique will lead to inferior de-biassing, but can represent significant savings, and using frames from modern CCDs may give quite satisfactory results.
makemos:
the normalisation part of makemos, if performed, is usually the most CPU intensive part of the data reduction process, although this depends on how numerous and how large the regions of overlap between frames are. The process should therefore be omitted if it is not required. Normalisation is performed only if one or both of the parameters scale and zero is true (both default to false): set scale=true only if multiplicative corrections might be required (for example, if the individual input images have differing exposure times) and set zero=true only if additive corrections might be required (for example, if the images have different background levels). If it must be performed, the following measures may decrease execution time, possibly at the expense of accuracy:
• if scale but not zero is being used and the images have variance information then set cmpvar=false,
• if there are many multiply-overlapping frames then set the parameter optov (optimum number of overlaps) to a small number (such as one),
• the normalisation is usually an iterative process, so it is possible to tweak the parameters controlling the iteration (maxit, tols, tolz).
findoff:
this application uses one of two algorithms to match objects between frames for determining their relative offset. The first algorithm scales as ${n}^{2}$ (where $n$ is the number of objects found by findobj), but if this fails it normally falls back on a more reliable algorithm which scales as ${n}^{3}lnn$. In this case, and if there are many objects, findoff can be very slow indeed and come to dominate the whole reduction process. Failure of the fast algorithm is also more likely when there are very many objects. For both these reasons it can be a good idea to limit the number of objects found by findobj – a few tens of objects in the overlap region is about right. You can control the number of objects found by by modifying the minpix parameter: the higher this threshold is set the fewer objects findobj will identify in the image.

More detailed information on each of these applications can be found in SUN/139[10].