This recipe gives some hints about reducing large images with CCDPACK (see SUN/139). Starlink data reduction applications, such as CCDPACK, do not on the whole have formal limits on image size. However, reducing very large sets of data can make heavy demands on system resources, which can lead to long run times, degradation of the performance (especially interactive response time) of the machine being used, failure of the applications, or in extreme cases system crashes. Even if you are of a patient disposition, these effects could make you unpopular with other users, so it is worth giving some additional thought to this sort of work.
What is and is not a problem large enough to require special care will depend on what is being done and on the computer being used. As a very rough indication, images smaller than 1000x1000 in most cases do not count as large, and ones larger than 50005000 in most cases (at the time of writing) do; for cases in-between it depends very much on the details.
The ‘size’ of a data reduction problem is some ill-defined function of, inter alia:
The principal resources which can fall into short supply during a data reduction process are as follows.
Normally the statistic which will actually concern you is elapsed, or ‘wall clock’ time, that is the number of minutes or hours between starting a job off, and the results being available. For a large data reduction job most of this time will typically be spent in I/O, which may or may not include moving data between real memory and swap space. In a multi-user environment however it is important to consider how your use of the machine is affecting the elapsed times of other people’s jobs, or other jobs of your own. As a general rule then, if your data reduction runs fast enough that it does not inconvenience you or other people then you do not have a ‘large’ problem. Otherwise, the rest of this recipe may provide some useful tips.
The Starlink NDF format is a special case of the HDS (Hierarchical Data System; see SUN/92) format. There is currently a fundamental limitation of HDS which will be corrected in a future release. Until then, there is a problem with HDS files longer than 512 Mbyte. Such files can result either from a user NDF file which is very long (for example, a 90009000 type _REAL frame with variances) or, more likely, from a file used as temporary workspace by CCDPACK or other applications.
This problem may not be reported as such by the software, but often manifests itself as an ‘Object not found’ error, which will cause the application to terminate. In this case there is not much which can be done apart from discussing the matter with the programmer responsible for supporting the package.
A full discussion of maximising performance for large jobs is beyond the scope of this document, but the following are good common-sense rules of thumb.
flatcor). When thinking about disk space requirements, remember that large temporary files can be created by some of the applications. These files have names like
t123.sdfand are created in the directory pointed to by the environment variable
HDS_SCRATCH, or in the current directory if
HDS_SCRATCHis not defined.
atcommand can be used to start a job at a given time, or there may be other queuing software installed at your site.
niceto other processes:
reniceshould be used when running CPU-intensive jobs on multi-user machines. In the C shell typing:
would run the script
reduce_script at a ‘niceness’ of 18. This setting means that the
job will be less aggressive in requesting CPU time, thus making it run slower, but
causing less disruption to other processes (presumably ones with more moderate
requirements). The higher the niceness, the less demanding the job is, with 18 often a
sensible maximum. Ask your system manager for more details; there may be locally
recommended values for certain kinds of job. Note however that the only resource
usage this affects is CPU time, so that even a maximally
niced job can cause major
Some parts of the data reduction process are much more expensive than others, and these are not always the same for large images as for small ones.
The maximum frame size which can be treated is determined mainly by the memory required. Exactly how this limitation manifests itself is quite dependent on the system, but if the size of the process is much bigger than available real memory it is likely to run very slowly. There may also be local guidelines about the largest processes which may be run on given machines. Table 1 gives a guide to how memory use of the most demanding CCDPACK applications scales with frame size.
Briefly, the heaviest users of resources are:
Elapsed time for a data reduction sequence will usually be dominated by
debias or the normalisation
makemos, or under some circumstances
findoff. More detail is given for some of these in the
The following tricks are applicable when using several of the Starlink applications. To use some of the
commands in the examples you will need to start KAPPA (see SUN/95) by typing
kappa at the
C shell prompt. These commands (
compick) are described fully in SUN/95; but by way of a
quick explanation, the
parget pair tells you one thing about the NDF being
makebias. If you wish to remove the VARIANCE component from a frame which already contains it, you can use the KAPPA command
|Data Type||Size (bytes)|
The type _WORD is usually sufficient for storage of most of the intermediate NDFs required in a
data reduction sequence (the exception is
makeflat which always generates a master flat field of
type _REAL or _DOUBLE). If your data type is _INTEGER, _REAL or _DOUBLE therefore it can
be worth reducing it to one of the smaller types. The KAPPA programs
can be used to determine and modify respectively the type of data in an NDF, as in this
Reducing the size of the data type may increase or reduce the CPU time requirements of the program, but should reduce the memory and I/O requirements. Under certain circumstances using a two-byte type can lead to overflow errors however, so some caution should be exercised.
ndftraceto work out the approximate size it should be and comparing this with the size shown by
ls. If disk space is very tight, and you do not want to delete files, such oversized NDFs can be compacted using the KAPPA application
ndfcopy. For example, using the file of reduced data type created above:
compick. If the averaged pixels are still small enough to under-sample the image point spread function this approach will be ok; otherwise it is rather a waste of good data, but may be useful for taking a quick look at oversized frames.
Finally, we list the most demanding of the CCDPACK applications with some notes about each one.
debiasis the heaviest user of memory and so is where problems are most likely to arise. The following suggestions are possible ways of limiting the resources used:
maskparameter is set (to the name of an image or ARD file) then more memory is required. It can therefore be more efficient to apply the mask explicitly elsewhere in the reduction sequence, for example, to the bias frame prior to de-biassing:
debias, and so should be avoided (using
genvar=false) if possible.
debiasand also makes it unnecessary to process the bias frames at all. This technique will lead to inferior de-biassing, but can represent significant savings, and using frames from modern CCDs may give quite satisfactory results.
makemos, if performed, is usually the most CPU intensive part of the data reduction process, although this depends on how numerous and how large the regions of overlap between frames are. The process should therefore be omitted if it is not required. Normalisation is performed only if one or both of the parameters
true(both default to
scale=trueonly if multiplicative corrections might be required (for example, if the individual input images have differing exposure times) and set
zero=trueonly if additive corrections might be required (for example, if the images have different background levels). If it must be performed, the following measures may decrease execution time, possibly at the expense of accuracy:
zerois being used and the images have variance information then set
optov(optimum number of overlaps) to a small number (such as one),
findobj), but if this fails it normally falls back on a more reliable algorithm which scales as . In this case, and if there are many objects,
findoffcan be very slow indeed and come to dominate the whole reduction process. Failure of the fast algorithm is also more likely when there are very many objects. For both these reasons it can be a good idea to limit the number of objects found by
findobj– a few tens of objects in the overlap region is about right. You can control the number of objects found by by modifying the
minpixparameter: the higher this threshold is set the fewer objects
findobjwill identify in the image.
More detailed information on each of these applications can be found in SUN/139.