The C-shell has no string-manipulation tools. Instead we mostly use the echo command and awk utility. The latter has its own language for text processing and you can write awk scripts to perform complex processing, normally from files. There are even books devoted to awk such as the succinctly titled sed & awk by Dale Dougherty (O’Reilly & Associates, 1990). Naturally enough a detailed description is far beyond the scope of this cookbook.
Other relevant tools include cut, paste, grep and sed. The last two and awk gain much of their power from regular expressions. A regular expression is a pattern of characters used to match the same characters in a search through text. The pattern can include special characters to refine the search. They include ones to anchor to the beginning or end of a line, select a specific number of characters, specify any characters and so on. If you are going to write lots of scripts which manipulate text, learning about regular expressions is time well spent.
Here we present a few one-line recipes. They can be included inline, but for clarity the values derived via awk are assigned to variables using the set command. Note that these may not be the most concise way of achieving the desired results.
To concatenate strings, you merely abut them.
Soobject
is assigned to "Processing NGC 2345."
. Note that spaces must be enclosed in
quotes. If you want to embed variable substitutions, these should be double quotes as
above.
On occasions the variable name is ambiguous. Then you have to break up the string. Suppose you
want text
to be "File cde1 is not abc"
. You can either make two abutted strings or encase the
variable name in braces ({ }), as shown below.
Here are some other examples of string concatenation.
This requires either the wc command in an expression (see Section 10), or the awk function length.
Here we determine the number of characters in variable object
using both recipes.
If the variable is an array, you can either obtain the length of the whole array or just an element. For the whole the number of characters is the length of a space-separated list of the elements. The double quotes are delimiters; they are not part of the string so are not counted.
This requires the awk function index. This returns zero if the string could not be located. Note that comparisons are case sensitive.
One method uses the awk function substr(,,). This returns the substring from string starting from character position up to a maximum length of characters. If is not supplied, the rest of the string from is returned. Let’s see it in action.
Another method uses the UNIX cut command. It too can specify a range or ranges of characters. It can
also extract fields separated by nominated characters. Here are some examples using the same values
for the array places
The -d
qualifier specifies the delimiter between associated data (otherwise called fields). Note the the
space delimiter must be quoted. The -f
qualifier selects the fields. You can also select character
columns with the -c
qualifier. Both -c
and -f
can comprise a comma-separated list of individual
values and/or ranges of values separated by a hyphen. As you might expect, cut can take its input
from files too.
The awk function split(,,sep) splits a string into an awk array using the delimiter sep.
hms
is an array so hms[2]
is 34
. The last three statements are equivalent, but the last
two more convenient for longer arrays. In the second you can specify the start index and
number of elements to print. If, however, the number of values can vary and you want
all of them to become array elements, then use the final recipe; here you specify the field
separator with awk’s FS
built-in variable, and the number of values with the NF
built-in
variable.
Some implementations of awk offer functions to change case.
Some implementations of awk offer substitution functions gsub(,) and sub(,). The latter substitutes the for the first match with the regular expression in our supplied text. The former replaces every occurrence.
There is also sed.
is equivalent to the first awk example above. Similarly you could replace all occurrences. is equivalent to the second example. The finalg
requests that the substitution is applied to all
occurrences.
A script may process and analyse many datasets, and the results from its calculations will often need presentation, often in tabular form or some aligned output, either for human readability or to be read by some other software.
The UNIX command printf permits formatted output. It is analogous to the C function of the same name. The syntax is
The format string may contain text, conversion codes, and interpreted sequences.
The conversion codes appear in the same order as the arguments they correspond to. A conversion code has the form
where the items in brackets are optional.
-
. *
substitutes the next variable in
the argument list, allowing the width
to be programmable.
*
substitutes the next variable in the
argument list, whose value should be
a positive integer.
The interpreted sequences include:\n
for a new line, \"
for a double quote, \%
for
a percentage sign, and \\
for a backslash.
Format codes | |
Code | Interpretation |
c | single character |
s | string |
d , i | signed integer |
o | integer written as unsigned octal |
x , X | integer written as unsigned hexadecimal, the latter using uppercase notation |
e , E | floating point in exponent form
|
f | floating point in |
g | uses whichever format of |
G | uses whichever format of |
Flags
| |
Code | Purpose |
- | left justify |
+ | begin a signed number with a |
blank | Add a space before a signed number
that does not begin with a |
0 | pad a number on the left with zeroes |
# | use a decimal point for
the floating-point conversions, and
do not remove trailing zeroes for |
If that’s computer gobbledygook here are some examples to make it clearer. The result of follows each
printf command, unless it is assigned to a variable through the set mechanism. The commentary after
the #
is neither part of the output nor should it be entered to replicate the examples. Let us start with
some integer values.
Note that there are different implementations. While you can check your system’s man pages that the desired feature is present, a better way is to experiment on the command line.