4 Group Expressions

 4.1 Elements and Delimiters
 4.2 Editing of Names
 4.3 Indirection Elements
 4.4 Modification Elements
 4.5 Nesting Within Group Expressions
 4.6 Flagging a Group Expression
 4.7 Comments Within Group Expressions
 4.8 Escaping the Special Characters in a Group Expression
 4.9 The Order of Names Within a Group

One of the most useful routines within GRP is GRP_GROUP. This routine appends names to the end of a previously created group using a “group expression” obtained from the environment via a named parameter which can be of any type. The routine GRP_GRPEX also performs this function, except that the group expression is provided by the calling application, rather than being obtained through the parameter system.

This section describes the syntax of group expressions.

4.1 Elements and Delimiters

Group expressions may contain several “delimiter” characters (usually a comma although this can be changed, see section 5.7) and the substrings delimited by these characters are referred to as “elements”. If there are no delimiters in a group expression, then the group expression consists of a single element. For instance, the group expression:

  NEW_FILE,A_*2|RAW|FLAT|,^LIST.DAT

consists of the three elements NEW_FILE, A_2RAWFLAT and ^LIST.DAT. Note, delimiter characters are ignored if they occur within matching “nesting characters” (see section 4.5). For instance, nesting prevents the group expression:

  FLATFIELD(100:200,20:220),OBJECT

being split into three elements instead of two (i.e. the first comma does not act as a delimiter because it occurs within a nest formed by matching parentheses).

Each element of a group expression may be a literal name (eg NEW_FILE in the previous example), or an “indirection element” or a “modification element”. An indirection element specifies a text file from which further names are to be read (eg ^LIST.DAT in the previous example). A modification element specifies an existing group of names which are to be used as the basis for the new names (eg A_2RAWFLAT in the previous example). These are described in more detail below.

4.2 Editing of Names

Each element in a group expression will give rise to one or more names (depending on whether the element consists of a literal name, an indirection element or a modification element). These names may be edited before being stored in a group by including certain “editing strings” within the text of the element. The general format of an element with editing strings included is:

  prefix{kernel}suffix|old|new|

The kernel string can be a single element, or can be a full group expression. Processing of the element proceeds as follows:

(1)
The kernel is first expanded to give a list of literal names. This may involve reading names from files, copying names from another group, etc, depending on the exact nature of the kernel. The characters which mark the start and end of the kernel are known as the opening and closing kernel delimiters. They are usually set to be “{” and “}”, but can be changed if needed.
(2)
Each name is checked to see if it contains the old string. If it does, all occurrences of the old string are replaced by the new string. The character which delimits the old and new strings is known as the “separator” character and is usually a “” character, but can be changed if needed. This substitution will be case sensitive if the group to which the names are to be added has been designated as case sensitive (see section 2.1). If no substitutions are to be performed then the old and new strings, together with the three separator characters, should be omitted.
(3)
The prefix string is added to the start of each name, and the suffix string is appended to the end of each name. Either or both of these strings may be null (i.e. of zero length).

The names which result from this processing are then added to a group. If there is no ambiguity about where the kernel starts and finishes (for instance if the prefix and suffix are both omitted, and the kernel consists of a single element) then the kernel does not need to be enclosed within kernel delimiters. The contents of the kernel can be any group expression. In particular, the kernel can contain other nested kernels with their own associated editing strings.

Let’s look at some examples:

  A_{TOM,DICK,HARRY}_B

This will give rise to the three names A_TOM_B, A_DICK_B and A_HARRY_B.

  ^FILE.LIS|_OLD|_NEW|

This will read names from the text file FILE.LIS (see the description of indirection elements below), and replace all occurrences of the string “_OLD” within the names with the string “_NEW”.

  WW,{A,B_{ONE,TWO,THREE}|T|Z|,C}KK|_Z|_Y|

This is a complex example and needs looking at carefully. Looking at it at the highest level, it can be thought of as:

  WW,{kernel}KK|_Z|_Y|

where kernel is the group expression:

  A,B_{ONE,TWO,THREE}|T|Z|,C

The first and third elements in this inner group expression are simple literal names and give rise to the two names A and C. The second element specifies that the three names ONE, TWO and THREE are to be edited by replacement of the letter T by the letter Z, and the addition of the prefix B_. After editing, these three names become B_ONE, B_ZWO and B_ZHREE. So the total group specified by this inner kernel is:

  A
  B_ONE
  B_ZWO
  B_ZHREE
  C

We can now go back and look at the full group expression in the form:

  WW,{kernel}KK|_Z|_Y|

The first element specifies the single name WW. The second element specifies that each of the names arising from the expansion of the inner kernel (i.e. the names listed above) should be edited by replacing _Z with _Y, and then appending the suffix KK. Thus the final group contains:

  WW
  AKK
  B_ONEKK
  B_YWOKK
  B_YHREEKK
  CKK

4.3 Indirection Elements

An indirection element consists of an “indirection character” (usually “^” (up arrow) although this can be changed, see section 5.7) followed by the name of a text file. For instance, the group expression:

  ^raw_data

would cause GRP to search for a file called raw_data.

The specified file is read to obtain further names to be added to the group. Each line in the file is processed as if it were a separate group expression, and so may contain any combination of literal names, modification elements or further indirection elements. It is thus possible to get several levels of indirection, in which a literal name is specified within a text file, which is itself specified within an indirection element contained within another text file, etc. GRP imposes a limit of 7 levels of indirection, primarily to safe-guard against “run-away” indirection which happens (for instance) when a file specifies itself within an indirection element.

Indirection elements are always considered to be case sensitive, even if the group has been designated case insensitive. This is because file names on certain operating systems (eg UNIX) are always considered case sensitive, and so problems would arise while accessing indirection files if GRP was to consider them case insensitive.

The file name can contain shell meta-characters (references to environment variables for instance) which will be expanded before the file is used.

4.4 Modification Elements

A modification element causes GRP to generate a set of names by copying the names from another group. These new names can then be modified using the facilities for editing names described above. The application specifies which group is to be used as the basis for the new names. A special character (usually a “” character, but this can be changed if required) is used as a token to represent all the names in the basis group. Thus:

  *|_DS|_BK|

would cause all the names in the basis group to be modified by replacing the string _DS with the string _BK. The basis names can also be modified by the addition of a prefix and suffix. Following the description of name editing given above, you may expect the format to be (for instance):

  Hello_{*}_Goodbye

in which the token character takes on the role of the kernel. This does in fact work, but in this case the opening and closing kernel delimiters (“{” and “}”) can be omitted because there is no ambiguity about where the kernel starts and finishes. Thus a simpler form would be:

  Hello_*_Goodbye

The addition of a prefix and suffix can be combined with substitution as usual. For instance, the element:

  A*B|C|D|

would cause all occurrences of the letter C within the names of the basis group to be replaced with D, followed by the addition of the prefix A and the suffix B.

If a “null” group is specified as the basis group (i.e. the group identifier is given as GRP__NOID), then there are no names on which to base the new names and the token character is treated as a literal name. That is, if the user gave the group expression

  A_*2|RAW|FLAT|

and the application had specified a null group as the basis for modification elements, the the specified editing would be applied to the literal name”, resulting in the single literal name “A_2” being added to the group.

4.5 Nesting Within Group Expressions

There is sometimes a clash of interests to be resolved when deciding on the best choice for the character which delimits elements within a group expression. The default delimiter character is the comma, but this character can sometimes be useful within an element, for instance when specifying a set of indices. For instance, if the user gave the group expression:

  A(1,2),B(3,10)

in which each element is a literal name corresponding to an array element, it would be wrong to split this up using the commas as delimiters into the four strings “A(1”, “2)”, “B(3” and “10)”.

To get round this particular problem, GRP ignores delimiters which occur within matching “nesting characters”. There are two nesting characters, the “open nest” character (usually set to “(”) and the “close nest” character (usually set to “)”). Thus in the above example, the commas occurring within the parentheses would not be treated as delimiters, resulting in the group expression being split into the two elements A(1,2) and B(3,10). The characters to use as the opening and closing nest characters may be set by the calling application (see section 5.7).

4.6 Flagging a Group Expression

GRP allows a group expression to be flagged by terminating it with a “flag” character (usually a minus sign although this can be changed, see section 5.7). If the last character in the group expression is a flag character, then the FLAG argument of routine GRP_GROUP is returned true. The flag character is stripped off the group expression before it is split up into elements, so the flag character itself does not get included in any of the names stored in the group.

A typical use of this facility might be to allow the user to request a further prompt for more names. For instance, in the example of section 1.2, the user may wish to specify more input file names than will fit on a single line. To allow this, the call to GRP_GROUP would be replaced with the following:

  *  Loop round, prompting the user for group expressions
  *  until one is found without a minus sign at the end, or an
  *  error occurs.
        FLAG = .TRUE.
        DO WHILE( FLAG .AND. STATUS .EQ. SAI__OK )
           CALL GRP_GROUP( ’IN_FILES’, GRP__NOID, IGRP1, SIZE1,
       :                   ADDED, FLAG, STATUS )
  
  *  Cancel the parameter association to get a new group
  *  expression on the next call to GRP_GROUP.
           CALL PAR_CANCL( ’IN_FILES’, STATUS )
  
        END DO

The user could then request a further prompt by appending a minus sign to the end of the group expression, as follows:

  NEW_FILE,A_*2|RAW|FLAT|,^LIST.DAT-

The names obtained at each prompt are appended to the end of the group, which expands as necessary.

Note, if the final element in a group expression is an indirection element, the flag character may be placed at the end of the last record in the indicated text file. For instance, instead of giving:

  ^LIST.DAT-

where the file LIST.DAT contains the single record

  RED,GREEN,BLUE

a user could “hard-wire” a flag character on to the end of LIST.DAT so that it contains:

  RED,GREEN,BLUE-

4.7 Comments Within Group Expressions

It is often useful to mix comments with names, particularly within a text file. All group expressions (whether obtained from the environment or from a text file or as an argument) are truncated if a “comment” character is found (usually ‘#” but this can be changed, see section 5.7). Anything occurring after such a character is ignored. In a text file, the comment is assumed to extend from the comment character to the end of the line, so a new group expression may be given on the next line. Note, blank lines are not ignored. Each blank line within a text file will result in a blank name being added to the group.

4.8 Escaping the Special Characters in a Group Expression

If is possible to specify that a given character be used as an “escape character” within group expressions. This facility is normally suppressed, but an application can choose to switch it on by assigning a value to the ESCAPE control character associated with a group (see section 5.7). If this is done, any special meaning associated with a character within a group expression is ignored if the character is preceeded by an escape character. The escape characters themselves are not included in the resulting names if they preceed any of the other “special” control characters. Note, escape characters which do not preceed another control character are included in the resulting names.

For instance, the group expression:

  * | A

would normally result in an error because the “|” character would be taken as the start of an incomplete specification for some editing to apply to the preceeding text (assuming the application has not changed the default editing behaviour). If, in fact, the user wants this string to be accepted as a literal string (maybe representing a Unix piping operation for instance), then the “|” should be escaped. Assuming the application chooses to use the backslash character “\” as the escape character, then this can be done by entering the following group expression:

  * \| A

The “\” character results in the “|” character being treated as part of the required string, rather than as the start of an editing specification. The string returned to the application is then “* | A” (note, the escape character has been removed). Any escape characters which do not preceed special characters are included literaly in the returned string. So, for instance, if the group expression was:

  \* \| A

the string “\* | A” would be returned to the application.

All escape characters within a section of a group expression can be ignored by using the special strings “ <!!” and “!! >” to mark the start and end of the section.

4.9 The Order of Names Within a Group

Names are stored within a group in the order in which they are specified in the group expression. For instance, if the file F1.DAT contained the following two records:

  A,^F2
  B,C

and the file F2.DAT contained the following three records:

  D
  E
  F

then the group expression:

  X,^F1.DAT,Y

would result in the names being added to the group in the following order

   X
   A
   D
   E
   F
   B
   C
   Y

The contents of the two indirection files have been inserted at the position at which the corresponding indirection element occurred. Names resulting from the expansion of modification elements are similarly inserted into the list at the position at which the modification element occurred. The modified names are stored in the same order as the names within the group upon which the modification was based. For example, if the above group is used as the basis for modification, then the group expression:

  U,*_2,V

would give rise to the group:

  U
  X_2
  A_2
  D_2
  E_2
  F_2
  B_2
  C_2
  Y_2
  V