### 12 Dealing with Files

#### 12.1 Extracting parts of filenames

Occasionally you’ll want to work with parts of a filename, such as the path or the file type. The C-shell provides filename modifiers that select the various portions. A couple are shown in the example below.

set type = $1:e set name =$1:r
if ( $type == "bdf" ) then echo "Use BDF2NDF on a VAX to convert the Interim file$1"
else if ( $type != "dst" ) then hdstrace$name
else
hdstrace @’"$1"’ endif Suppose the first argument of a script, $1, is a filename called galaxy.bdf. The value of variable type is bdf and name equates to galaxy because of the presence of the filename modifiers :e and :r. The rest of the script uses the file type to control the processing, in this case to provide a listing of the contents of a data file using the Hdstrace utility.

The complete list of modifiers, their meanings, and examples is presented in the table below.

 Modifier Value returned Value for filename /star/bin/kappa/comwest.sdf :e Portion of the filename following a full stop; if the filename does not contain a full stop, it returns a null string sdf :r Portion of the filename preceding a full stop; if there is no full stop present, it returns the complete filename comwest :h The path of the filename (mnemonic: h for head) /star/bin/kappa :t The tail of the file specification, excluding the path comwest.sdf

#### 12.2 Process a Series of Files

One of the most common things you’ll want to do, having devised a data-processing path, is to apply those operations to a series of data files. For this you need a foreach...end construct.

convert               # Only need be invoked once per process
foreach file (*.fit)
stats $file end This takes all the FITS files in the current directory and computes the statistics for them using the stats command from (SUN/95). file is a shell variable. Each time in the loop file is assigned to the name of the next file of the list between the parentheses. The * is the familiar wildcard which matches any string. Remember when you want to use the shell variable’s value you prefix it with a $. Thus $file is the filename. ##### 12.2.1 NDFs Some data formats like the NDF demand that only the file name (i.e. what appears before the last dot) be given in commands. To achieve this you must first strip off the remainder (the file extension or type) with the :rfile modifier. foreach file (*.sdf) histogram$file:r accept
end

This processes all the HDS files in the current directory and calculates an histogram for each of them using the histogram command from (SUN/95). It assumes that the files are NDFs. The :r instructs the shell to remove the file extension (the part of the name following the the rightmost full stop). If we didn’t do this, the histogram task would try to find NDFs called SDF within each of the HDS files.

##### 12.2.2 Wildcarded lists of files

You can give a list of files separated by spaces, each of which can include the various UNIX wildcards. Thus the code below would report the name of each NDF and its standard deviation. The NDFs are called ‘Z’ followed by a single character, ccd1, ccd2, ccd3, and spot.

foreach file (Z?.sdf ccd[1-3].sdf spot.sdf)
echo "NDF:" $file:r"; sigma: "‘stats$file:r | grep "Standard deviation"‘
end

echo writes to standard output, so you can write text including values of shell variables to the screen or redirect it to a file. Thus the output produced by stats is piped (the | is the pipe) into the UNIX grep utility to search for the string "Standard deviation". The ‘  ‘ invokes the command, and the resulting standard deviation is substituted.

You might just want to provide an arbitrary list of NDFs as arguments to a generic script. Suppose you had a script called splotem, and you have made it executable with chmod +x splotem.

#!/bin/csh
figaro                 # Only need be invoked once per process
foreach file ($*) if (-e$file) then
splot $file:r accept endif end Notice the -e file-comparison operator. It tests whether the file exists or not. (Section 12.4 has a full list of the file operators.) To plot a series of spectra stored in NDFs, you just invoke it something like this. % ./splotem myndf.sdf arc[a-z].sdf hd[0-9]*.sdf See the glossary for a list of the available wildcards such as the [a-z] in the above example. ##### 12.2.3 Exclude the .sdf for NDFs In the splotem example from the previous section the list of NDFs on the command line required the inclusion of the .sdf file extension. Having to supply the .sdf for an NDF is abnormal. For reasons of familiarity and ease of use, you probably want your relevant scripts to accept a list of NDF names and to append the file extension automatically before the list is passed to foreach. So let’s modify the previous example to do this. #!/bin/csh figaro # Only need be invoked once per process # Append the HDS file extension to the supplied arguments. set ndfs set i = 1 while ($i <= $#argv ) set ndfs = ($ndfs[*] $argv[i]".sdf") @ i =$i + 1
end

#  Plot each 1-dimensional NDFs.
foreach file ($ndfs[*]) if (-e$file) then
splot $file:r accept endif end This loops through all the arguments and appends the HDS-file extension to them by using a work array ndfs. The set defines a value for a shell variable; don’t forget the spaces around the =. ndfs[*] means all the elements of variable ndfs. The loop adds elements to ndfs which is initialised without a value. Note the necessary parentheses around the expression ($ndfs[*] $argv[i]".sdf"). On the command line the wildcards have to be passed verbatim, because the shell will try to match with files than don’t have the .sdf file extension. Thus you must protect the wildcards with quotes. It’s a nuisance, but the advantages of wildcards more than compensate. % ./splotem myndf ’arc[a-z]’ ’hd[0-9]*’ % ./noise myndf ’ccd[a-z]’ If you forget to write the ’ ’, you’ll receive a No match error. ##### 12.2.4 Examine a series of NDFs A common need is to browse through several datasets, perhaps to locate a specific one, or to determine which are acceptable for further processing. The following presents images of a series of NDFs using the display task of (SUN/95). The title of each plot tells you which NDF is currently displayed. foreach file (*.sdf) display$file:r axes style="’title==$file:r’" accept sleep 5 end sleep pauses the process for a given number of seconds, allowing you to view each image. If this is too inflexible you could add a prompt so the script displays the image once you press the return key. set nfiles = ‘ls *.sdf | wc -w‘ set i = 1 foreach file (*.sdf) display$file:r axes style="’title==$file:r’" accept # Prompt for the next file unless it is the last. if ($i < $nfiles ) then echo -n "Next?" set next =$<

# Increment the file counter by one.
@ i++
endif
end

The first lines shows a quick way of counting the number of files. It uses ls to expand the wildcards, then the command wc to count the number of words. The back quotes cause the instruction between them to be run and the values generated to be assigned to variable nfiles.

You can substitute another visualisation command for display as appropriate. You can also use the graphics database to plot more than one image on the screen or to hardcopy. The script $KAPPA_DIR/multiplot.csh does the latter. #### 12.3 Filename modification Thus far the examples have not created a new file. When you want to create an output file, you need a name for it. This could be an explicit name, one derived from the process identification number, one generated by some counter, or from the input filename. Here we deal with all but the trivial first case. ##### 12.3.1 Appending to the input filename To help identify datasets and to indicate the processing steps used to generate them, their names are often created by appending suffices to the original file name. This is illustrated below. foreach file (*.sdf) set ndf =$file:r
block in=$ndf out=$ndf"_sm" accept
end

This uses block from (SUN/95) to perform block smoothing on a series of NDFs, creating new NDFs, each of which takes the name of the corresponding input NDF with a _sm suffix. The accept keyword accepts the suggested defaults for parameters that would otherwise be prompted. We use the set to assign the NDF name to variable ndf for clarity.
##### 12.3.2 Appending a counter to the input filename

If a counter is preferred, this example

set count = 1
foreach file (*.sdf)
set ndf = $file:r @ count =$count + 1
block in=$ndf out=smooth$count accept
end

would behave as the previous one except that the output NDFs would be called smooth1, smooth2 and so on.
##### 12.3.3 Appending to the input filename

Whilst appending a suffix after each data-processing stage is feasible, it can generate some long names, which are tedious to handle. Instead you might want to replace part of the input name with a new string. The following creates another shell variable, ndfout by replacing the string _flat from the input NDF name with _sm. The script pipes the input name into the sed editor which performs the substitution.

foreach file (*_flat.sdf)
set ndf = $file:r set ndfout = ‘echo$ndf | sed ’s#_flat#_sm#’‘
block in=$ndf out=$ndfout accept
end

The # is a delimiter for the strings being substituted; it should be a character that is not present in the strings being altered. Notice the ‘ ‘ quotes in the assignment of ndfout. These instruct the shell to process the expression immediately, rather than treating it as a literal string. This is how you can put values output from UNIX commands and other applications into shell variables.

#### 12.4 File operators

There is a special class of C-shell operator that lets you test the properties of a file. A file operator is used in comparison expressions of the form if (file_operator file) then. A list of file operators is tabulated to the right.

The most common usage is to test for a file’s existence. The following only runs cleanup if the first argument is an existing file.

 File operators Operator True if: -d file is a directory -e file exists -f file is ordinary -o you are the owner of the file -r file is readable by you -w file is writable by you -x file is executable by you -z file is empty

# Check that the file given by the first
# argument exists before attempting to
# use it.
if ( -e $1 ) then cleanup$1
endif

Here are some other examples.

# Remove any empty directories.
if ( -d $file && -z$file ) then
rmdir $file # Give execute access to text files with a .csh extension. else if ($file:e == "csh" && -f $file ) then chmod +x$file
endif

#### 12.5 Creating text files

A frequent feature of scripts is redirecting the output from tasks to a text file. For instance,

hdstrace $file:r >$file:r.lis
foo

Command ./doubleword reads its standard input from the file mynovel.txt. The «word obtains the input data from the script file itself until there is line beginning word. You may also include variables and commands to execute as the $, \, and ‘ ‘ retain their special meaning. If you want these characters to be treated literally, say to prevent substitution, insert a \ before the delimiting word. The command myprog reads from the script, substituting the value of variable nstars in the second line, and the number of lines in file brightnesses.txt in the third line. The technical term for such files are here documents. #### 12.9 Discarding text output The output from some routines is often unwanted in scripts. In these cases redirect the standard output to a null file. correlate in1=frame1 in2=frame2 out=framec > /dev/null Here the text output from the task correlate is disposed of to the /dev/null file. Messages from Starlink tasks and usually Fortran channel 6 write to standard output. #### 12.10 Obtaining dataset attributes When writing a data-processing pipeline connecting several applications you will often need to know some attribute of the data file, such as its number of dimensions, its shape, whether or not it may contain bad pixels, a variance array or a specified extension. The way to access these data is with ndftrace from (SUN/95) and parget commands. ndftrace inquires the data, and parget communicates the information to a shell variable. ##### 12.10.1 Obtaining dataset shape Suppose that you want to process all the two-dimensional NDFs in a directory. You would write something like this in your script. foreach file (*.sdf) ndftrace$file:r > /dev/null
set nodims = ‘parget ndim ndftrace‘
if ( $nodims == 2 ) then <perform the processing of the two-dimensional datasets> endif end Note although called ndftrace, this function can determine the properties of foreign data formats through the automatic conversion system (SUN/55, SSN/20). Of course, other formats do not have all the facilities of an NDF. If you want the dimensions of a FITS file supplied as the first argument you need this ingredient. ndftrace$1 > /dev/null
set dims = ‘parget dims ndftrace‘

Then dims[$i$] will contain the size of the ${i}^{th}$ dimension. Similarly

ndftrace $1 > /dev/null set lbnd = ‘parget lbound ndftrace‘ set ubnd = ‘parget ubound‘ will assign the pixel bounds to arrays lbnd and ubnd. ##### 12.10.2 Available attributes Below is a complete list of the results parameters from ndftrace. If the parameter is an array, it will have one element per dimension of the data array (given by parameter NDIM); except for EXTNAM and EXTTYPE where there is one element per extension (given by parameter NEXTN). Several of the axis parameters are only set if the ndftrace input keyword fullaxis is set (not the default). To obtain, say, the data type of the axis centres of the current dataset, the code would look like this. ndftrace fullaxis accept > dev/null set axtype = ‘parget atype ndftrace‘  Name Array? Meaning AEND Yes The axis upper extents of the NDF. For non-monotonic axes, zero is used. See parameter AMONO. This is not assigned if AXIS is FALSE. AFORM Yes The storage forms of the axis centres of the NDF. This is only written when parameter FULLAXIS is TRUE and AXIS is TRUE. ALABEL Yes The axis labels of the NDF. This is not assigned if AXIS is FALSE. AMONO Yes These are TRUE when the axis centres are monotonic, and FALSE otherwise. This is not assigned if AXIS is FALSE. ANORM Yes The axis normalisation flags of the NDF. This is only written when FULLAXIS is TRUE and AXIS is TRUE. ASTART Yes The axis lower extents of the NDF. For non-monotonic axes, zero is used. See parameter AMONO. This is not assigned if AXIS is FALSE. ATYPE Yes The data types of the axis centres of the NDF. This is only written when FULLAXIS is TRUE and AXIS is TRUE. AUNITS Yes The axis units of the NDF. This is not assigned if AXIS is FALSE. AVARIANCE Yes Whether or not there are axis variance arrays present in the NDF. This is only written when FULLAXIS is TRUE and AXIS is TRUE. AXIS Whether or not the NDF has an axis system. BAD If TRUE, the NDF’s data array may contain bad values. BADBITS The BADBITS mask. This is only valid when QUALITY is TRUE. CURRENT The integer Frame index of the current co-ordinate Frame in the WCS component. DIMS Yes The dimensions of the NDF. EXTNAME Yes The names of the extensions in the NDF. It is only written when NEXTN is positive. EXTTYPE Yes The types of the extensions in the NDF. Their order corresponds to the names in EXTNAME. It is only written when NEXTN is positive. FDIM Yes The numbers of axes in each co-ordinate Frame stored in the WCS component of the NDF. The elements in this parameter correspond to those in FDOMAIN and FTITLE. The number of elements in each of these parameters is given by NFRAME. FDOMAIN Yes The domain of each co-ordinate Frame stored in the WCS component of the NDF. The elements in this parameter correspond to those in FDIM and FTITLE. The number of elements in each of these parameters is given by NFRAME. FLABEL Yes The axis labels from the current WCS Frame of the NDF. FLBND Yes The lower bounds of the bounding box enclosing the NDF in the current WCS Frame. The number of elements in this parameter is equal to the number of axes in the current WCS Frame (see FDIM). FORM The storage form of the NDF’s data array. FTITLE Yes The title of each co-ordinate Frame stored in the WCS component of the NDF. The elements in this parameter correspond to those in FDOMAIN and FDIM. The number of elements in each of these parameters is given by NFRAME.  Name Array? Meaning FUBND Yes The upper bounds of the bounding box enclosing the NDF in the current WCS Frame. The number of elements in this parameter is equal to the number of axes in the current WCS Frame (see FDIM). FUNIT Yes The axis units from the current WCS Frame of the NDF. HISTORY Whether or not the NDF contains HISTORY records. LABEL The label of the NDF. LBOUND Yes The lower bounds of the NDF. NDIM The number of dimensions of the NDF. NEXTN The number of extensions in the NDF. NFRAME The number of WCS domains described by FDIM, FDOMAIN and FTITLE. Set to zero if WCS is FALSE. QUALITY Whether or not the NDF contains a QUALITY array. TITLE The title of the NDF. TYPE The data type of the NDF’s data array. UBOUND Yes The upper bounds of the NDF. UNITS The units of the NDF. VARIANCE Whether or not the NDF contains a VARIANCE array. WCS Whether or not the NDF has any WCS co-ordinate Frames, over and above the default GRID, PIXEL and AXIS Frames. WIDTH Yes Whether or not there are axis width arrays present in the NDF. This is only written when FULLAXIS is TRUE and AXIS is TRUE. ##### 12.10.3 Does the dataset have variance/quality/axis/history information? Suppose you have an application which demands that variance information be present, say for optimal extraction of spectra, you could test for the existence of a variance array in your FITS file called dataset.fit like this. # Enable automatic conversion convert # Needs to be invoked only once per process set file = dataset.fit ndftrace$file > /dev/null
set varpres = ‘parget variance ndftrace‘
if ( $varpres == "FALSE" ) then echo "File$file does not contain variance information"
else
<process the dataset>
endif

The logical results parameters have values TRUE or FALSE. You merely substitute another component such as quality or axis in the parget command to test for the presence of these components.
##### 12.10.4 Testing for bad pixels

Imagine you have an application which could not process bad pixels. You could test whether a dataset might contain bad pixels, and run some pre-processing task to remove them first. This attribute could be inquired via ndftrace. If you need to know whether or not any were actually present, you should run setbad from (SUN/95) first.

setbad $file ndftrace$file > /dev/null
if ( badpix == "TRUE" ) then
else
goto tidy
endif
<perform data processing>

tidy:
<tidy any temporary files, windows etc.>
exit

Here we also introduce the goto command—yes there really is one. It is usually reserved for exiting (goto exit), or, as here, moving to a named label. This lets us skip over some code, and move directly to the closedown tidying operations. Notice the colon terminating the label itself, and that it is absent from the goto command.

##### 12.10.5 Testing for a spectral dataset

One recipe for testing for a spectrum is to look at the axis labels. (whereas a modern approach might use WCS information). Here is a longer example showing how this might be implemented. Suppose the name of the dataset being probed is stored in variable ndf.

# Get the full attributes.
ndftrace $ndf fullaxis accept > /dev/null # Assign the axis labels and number of dimensions to variables. set axlabel = ‘parget atype ndftrace‘ set nodims = ‘parget ndim‘ # Exit the script when there are too many dimensions to handle. if ($nodims > 2 ) then
echo Cannot process a $nodims-dimensional dataset. goto exit endif # Loop for each dimension or until a spectral axis is detected. set i = 1 set spectrum = FALSE while ($i <= nodims && $spectrum == FALSE ) # For simplicity the definition of a spectral axis is that # the axis label is one of a list of acceptable values. This # test could be made more sophisticated. The toupper converts the # label to uppercase to simplify the comparison. Note the \ line # continuation. set uaxlabel = ‘echo$axlabel[$i] | awk ’{print toupper($0)}’‘
if ( $uaxlabel == "WAVELENGTH" ||$uaxlabel == "FREQUENCY" \
$uaxlabel == "VELOCITY" ) then # Record that the axis is found and which dimension it is. set spectrum = TRUE set spaxis =$i
endif
@ i++
end

# Process the spectrum.
if ( $spectrum == TRUE ) then # Rotate the dataset to make the spectral axis along the first # dimension. if ($spaxis == 2 ) then
irot90 $file$file"_rot" accept

# Fit the continuum.
sfit spectrum=$file"_rot" order=2 output=$file"_fit" accept
else
sfit spectrum=$file order=2 output=$file"_fit accept
end if
endif

Associated with FITS files and many NDFs is header information stored in 80-character ‘cards’. It is possible to use these ancillary data in your script. Each non-comment header has a keyword, by which you can reference it; a value; and usually a comment. (SUN/95) from V0.10 has a few commands for processing FITS header information described in the following sections.

##### 12.11.1 Testing for the existence of a FITS header value

Suppose that you wanted to determine whether an NDF called image123 contains an AIRMASS keyword in its FITS headers (stored in the FITS extension).

set airpres = ‘fitsexist image123 airmass‘
if ( $airpres == "TRUE" ) then <access AIRMASS FITS header> endif Variable airpres would be assigned "TRUE" when the AIRMASS card was present, and "FALSE" otherwise. Remember that the ‘ ‘ quotes cause the enclosed command to be executed. ##### 12.11.2 Reading a FITS header value Once we know the named header exists, we can then assign its value to a shell variable. set airpres = ‘fitsexist image123 airmass‘ if ($airpres == "TRUE" ) then
set airmass = ‘fitsval image123 airmass‘
echo "The airmass for image123 is $airmass." endif ##### 12.11.3 Writing or modifying a FITS header value We can also write new headers at specified locations (the default being just before the END card), or revise the value and/or comment of existing headers. As we know the header AIRMASS exists in image123, the following revises the value and comment of the AIRMASS header. It also writes a new header called FILTER immediately preceding the AIRMASS card assigning it value B and comment Waveband. fitswrite image123 airmass value=1.062 comment=\"Corrected airmass\" fitswrite image123 filter position=airmass value=B comment=Waveband As we want the metacharacters " to be treated literally, each is preceded by a backslash. #### 12.12 Accessing other objects You can manipulate data objects in HDS files, such as components of an NDF’s extension. There are several Starlink applications for this purpose including the FIGARO commands copobj, creobj, delobj, renobj, setobj; and the (SUN/95) commands setext, and erase. For example, if you wanted to obtain the value of the EPOCH object from an extension called IRAS_ASTROMETRY in an NDF called lmc, you could do it like this. set year = ‘setext lmc xname=iras_astrometry option=get \ cname=epoch noloop‘ The noloop prevents prompting for another extension-editing operation. The single backslash is the line continuation. #### 12.13 Defining NDF sections with variables If you want to define a subset or superset of a dataset, most Starlink applications recognise NDF sections (see SUN/95’s chapter called “NDF Sections”) appended after the name. A naïve approach might expect the following to work set lbnd = 50 set ubnd = 120 linplot$KAPPA_DIR/spectrum"($lbnd:$ubnd)"
display $KAPPA_DIR/comwest"($lbnd:$ubnd",$lbnd~$ubnd)" however, they generate the Bad : modifier in$ ($). error. That’s because it is stupidly looking for a filename modifier :$ (see Section 12.1).

Instead here are some recipes that work.

set lbnd = 50
set ubnd = 120
set lrange = "101:150"

linplot $KAPPA_DIR/spectrum"($lbnd":"$ubnd)" stats abc"(-20:99,~$ubnd)"
display $KAPPA_DIR/comwest"($lbnd":"$ubnd",$lbnd":"$ubnd")" histogram hale-bopp.fit’(’$lbnd’:’$ubnd’,’$lbnd’:’$ubnd’)’ ndfcopy$file1.imh"("$lbnd":"$ubnd","$lrange")"$work"1"
splot hd102456’(’\$ubnd~60’)’

An easy-to-remember formula is to enclose the parentheses and colons in quotes.