8 Using the PISA parameters — PISAPEAK, PISAKNN, PISA2CAT & PISA2ARD

 8.1 PISA data in CURSA and CATPAC
 8.2 Getting the PISA output files into CLUSTAN
 8.3 Object classification
 8.4 Transforming PISA parameters to intensity invariant form
 8.5 Distribution free classification
 8.6 PISA2ARD

Other packages exist which can be used to perform selections, transform values, and generally analyse and categorise the PISAFIND parameterisations. With sufficient effort they may be used to classify objects. Catalogue manipulations can be done using the CURSA [5] and CATPAC[8] packages. More general plotting packages also be used to investigate the various results; PONGO [12] and SM [13] both plot data points and allow mathematical transformations of columns. Using CLUSTAN you could investigate the natural clustering of objects in parameter space (with the aim of classification in mind).

8.1 PISA data in CURSA and CATPAC

Any of the PISA output files (from PISAFIND and PISAPEAK) can be converted for use by the CURSA [5] and the CATPAC [8] packages. The easiest way to convert a PISA results file into a catalogue is to use the

  # pisa2cat

routine. CURSA and CATPAC can be used to perform simple manipulations — operations such as, sorting and subsetting according to various criteria.

8.2 Getting the PISA output files into CLUSTAN

It is possible to get PISA format data into CLUSTAN and do analyses based on the ‘natural’ clustering of the RESULTS. The analyses are highly dependent on the similarity measure (the distance in the parameter space) and consequently the clustering which is found is more often than not due to the measure rather than the physical attributes of the objects. In runs of CLUSTAN on the test NDF ‘FRAME’ in the PISA directory seem to select strongly by angle and the sign of the cross moment (SXY), and very weakly in others (ellipticity for one) which one might expect to bear more relevance to the real ‘clustering’. This problem probably requires careful selection of the variables to use and may well require the production of new hybrid variables (things like intensity/peak) as performed by PISAPEAK.

A description of the entry of PISA data into CLUSTAN is given below and an analysis using this may be attempted, however, NO significance should be ascribed to the results. If you really want to use this method there is no substitute to a proper understanding.

A file called clustan.dat is found in $PISA_DIR, take a copy of it. It shows how to get data into CLUSTAN, to run it in ‘batch’ mode simply change the file names to those of your PISAFIND data files and type.

  # clustan < clustan.dat

8.3 Object classification

It should, in principle, be possible to classify objects using the PISAFIND parameterisations of a single frame, but, this will only be possible provided that the apparent morphology of subject objects allow it (faint noise-limited objects are unlikely to be distinguishable from stars, and you shouldn’t be too disappointed when they are not).

However, given a set of distinguishable objects a classification can be performed. Before such a task can be undertaken it is first necessary to remove the intensity dependency of the parameters. This allows the ‘shape’ for a class of objects to remain reasonably constant. The quality ‘shape’, in this context, really refers to a multivariate function. Usually the only objects on any single frame which have a constant shape are the stars. So the approach adopted in PISA is to ‘normalize’ the PISAFIND parameterisations so that they are referenced to an ideal star. This is the first stage of classification using PISA.

8.4 Transforming PISA parameters to intensity invariant form

PISAPEAK transforms the PISAFIND RESULTS parameterisations so that the variables are intensity invariant, assuming a stellar profile. The output from PISAPEAK is intended for use in star-galaxy separation, either by applying direct cuts in variable values (PISACUT) or by discrimination analysis routines such as PISAKNN. A fit to stars on a frame can be performed by PISAFIT, which should be used prior to PISAPEAK. The output from PISAPEAK is a list of four parameters:

The peakedness measure is the ratio of the semi-major axis of the detected object (at the detection threshold) to the radius of the analytic function which has the same peak value (averaged with the central 9 pixels). Thus it specifies how more extended the object is than a star with the same peak value. This should select strongly between galaxies and stars. Indeed very good separation can be performed by using cuts of this value.

The model profile is used to scale the object peakedness and intensity peak ratio to values around one. This only works well if the PISA profiling function is a good fit to the stars on your frame. If it is not a good fit or you cannot determine the model parameters then inaccurate values can be used, the only criterion being that the GSIGM value is about the FWHM seeing of your data (this is important for resampling the peak of the object). The results file will now contain values ‘normalised’ to this ‘imaginary’ object. The stars will still form a group of objects with similar values, although they may not be as tightly clustered as with a good fit.

8.5 Distribution free classification

PISAKNN use the results of PISAPEAK to discriminate objects into two classes. PISAKNN uses KNN (k nearest neighbours) distribution-free multivariate discrimination to classify objects into two classes. The classes are seeded by supplying two files which contain the indices of objects typical to the class in question ( > 5, approximately equal numbers of each). Each object then propagates its class to the other objects on the basis of which class of the 2*k nearest neighbours (in the parameter space of the PISAPEAK results) of each of the unclassified objects is most common. This procedure is iterated until all objects are assigned and have a stable class or until a maximum number of iterations is exceeded. The results of the discrimination are written to two output files, one for each class.

KNN relies on good seed statistics as propagation is essentially linear (but remember that this is in a multivariate sense). If the seed subjects do not reasonably span the whole of the object parameter space improper incursion can occur, leading to misclassification. The classification in boundary areas between the objects will depend on the size of the nearest neighbour count, larger values will help the investigation of the ‘fuzzy’ areas. If a very small value is chosen then this will act almost as a thresholding (but of all the parameters not just one).

KNN has the advantage over classical discriminant analysis in that it does not relies on the classes of objects having multinomial distributions. The assumption of the normality of the objects contributing to each class relies all classes having a random spread across a particular part of parameter space. It is unlikely that this requirement can be met for small galaxy populations, although this may work well for large statistical samples.

The ellipticity is included in the analysis, however, this may not always help selection. If some smallish round galaxies are present using this variable will increase the weight of selecting them as stars. In this case it may be profitable to switch off the ellipticity. Ellipticity can be used for other purposes, say if you want a complete sample of stars, all stars will have ellipticities below a given threshold and can be selected thus. Further refinement can then be applied to the list by thresholding in peakedness to remove objects with large wings.

8.6 PISA2ARD

The PISA2ARD application converts the results file output from PISAFIND into an ‘ARD description’. This description consists of a series of ellipses – one for each detected object. The ARD description can be used by suitable applications to remove or analyse the areas within the ellipses. So for instance to remove all the detected objects from a frame, convert the results file into an ARD description (with the ellipses possibly scaled by some small amount to remove outlying parts) and use an ARD masking routine, such as KAPPA ARDMASK. ARD is also used by the packages ESP [10] and CCDPACK [11].