## Chapter 2Computing

2.1 Unix guides
2.2 Editors

2.3.1 Numerical Recipes

2.4.1 Fortran 77
2.4.2 Fortran 90/95
2.4.3 C

2.4.5 Java
2.4.6 Other languages
2.5 Code topics
2.5.1 Profiling
2.5.2 Optimization
2.5.3 Debugging

2.6 Link farms

Everyone should start with the Starlink User’s Guide.

### 2.1 Unix guides

Before you can do anything, you need to make friends with the machine. The slogan to remember here is ‘unix is user-friendly, it’s just picky about who its friends are’. Keep calm, breath deeply, and make friends with a guru.

Your first source of information might be Starlink’s Unix introduction. This covers the basics of logging on, moving around, issuing commands, creating files, and starting programming. It also includes references to other Starlink documents which can provide more detailed help on various aspects.

There are very good online introductions to Unix at http://unixhelp.ed.ac.uk/ and http://star-www.maps.susx.ac.uk/help/4ltrwrd/unixman.html.

There are numerous books on Unix. Two which seem to be at the right level are “Unix Shells by Example” [?] and “Unix in a Nutshell” [?]. Both cover the Bourne shell (sh) and the C shell (csh), plus other utilities such as sed and awk (see 2.4.6.1). By the way, let me put in a plug for bash as a usable shell, as it includes all the best bits of the Bourne and C shells. The only disadvantage of bash is that Starlink has, to some extent, standardised on csh, so that setup scripts, for example, are designed to work with csh alone; this is not often a real problem, since these scripts are generally simple enough that you can duplicate their effects ‘by hand’.

You will occasionally see references to unix manual pages followed by a number. This indicates which section of the manual the documentation can be found in (section 1 is normal user commands, section 3 is standard library calls, section 5 is file formats, and so on). If you see a reference to sed(1), for example, you’d read the manual page online with the command man sed. A useful variant of the man command is apropos, for example apropos find. This searches the list of man-pages, and lists all those commands which include a particular word in their short description.

### 2.2 Editors

One of the aspects of working on Unix which you’ll have to deal with pretty early in your contact with it, is which editor to use. An editor is the tool you use to create your documents or programs. It’s the tool you’ll use more than any other.

It’s very much a personal decision, which editor you use, and when you’re feeling understimulated, you can discuss the matter with your officemates (don’t get blood on the machines, though, and remember to wash and dry your thumbscrews carefully after use — you’ll need them again when you discuss programming languages — see 2.4).

SUN/170, “Editors on Unix”, is an overview of some of the available options, listing their good and bad points. In addition to this, Brian McIlwrath wrote an overview of available text editors in the September 1998 (issue 21) edition of the Starlink Bulletin.

Emacs is many people’s favourite (and mine). You can do just about anything in emacs — it’s a hugely productive environment — but you may get cricks in your fingers from the odd key combinations. Emacs itself can help you up the long learning curve: there’s a very good tutorial built-in to emacs, and available by typing C-h C-t (that’s control-H, control-T), plus extensive documentation behind C-h C-i. Leave emacs with C-x C-c (obvious, really).

Vi is another favourite. It is indeed very powerful, but rather more of an acquired taste. An advantage of vi is that it’s small, and therefore quick to start up — this makes it useful if you want to make a quick change to a file, or make similar changes to a number of files, and it makes it useful as a editor for Pine, or any other mail program. An even bigger advantage of vi, however, is that it’s available on every Unix system since the Ark, so if you can use vi, you can work on whichever Unix system you find yourself. The main disadvantage is that it is exceedingly unintuitive. There’s quite a lot of help available for vi (it needs it, after all), but not, oddly enough, in the manual page, which describes in elaborate detail how to start up vi, but not what to do once you succeed. You’ll find a good introduction to vi in Chapter 6 of the “Solaris 2.6 Advanced Users’ Guide”, and in Chapter 2 of the old “SunOS 4 Documentation Tools” manual, and probably in the corresponding manual for any other Unix you use (don’t worry about these being out of date, by the way, vi doesn’t change much...). Online introductions to, and references for, vi include the the unixhelp manual, vi101, and even the vi-lovers home page!

If you want to leave vi without saving what you’ve (accidentally?) typed, you can do so by typing :q! (if it beeps at you, or if those characters simply appear in the typing buffer on screen, try pressing escape once or twice first). By the way, it’s pronounced ‘vee-eye’, not ‘vye’: pronounce it the wrong way and you’ll lose all your credibility (of course, if you know the correct way to pronounce it, you’ll lose all credibility with a different set of folk — your choice).

For VMS fans, there’s jed, which is a simple and fairly sane editor which includes an emulation of the EDT editor of old. It’s available on Starlink systems, and documented in SUN/168.

Finally, there’s pico, the editor used internally by the pine mailer. You can’t go wrong with this one, but I’d imagine it could be a little painful if you’re writing much code with it. Along the same lines, the textedit editor which comes with OpenWindows really isn’t up to much. It looks pretty, but has zero support for programming. You’ll be using an editor a lot, so it’s worth investing time to learn to use a powerful one to its full extent.

### 2.3 Numerical analysis

Anyone writing numerical code needs some awareness of numerical analysis, to avoid burning up CPU time with a hideously inappropriate and inaccurate algorithm.

By far the most accessible introduction to numerical methods is Numerical Recipes [?], which comes in different editions, containing code in C, Fortran and Fortran 90. Use the second edition: the first has a significant number of bugs.

A large part of the book’s popularity stems from the fact that its authors are scientists rather than numerical analysts, so that they are more concerned with producing reliable results than with a point of view which sometimes appears to see efficiency, unshakable robustness and algorithmic elegance as ends in themselves. For further discussion, and some caveats, see 2.3.1.

My advice is to use “Numerical Recipes” for your numerical programming until it runs out of steam on your particular problem. Follow the references in there, or look at the booklist in the on-line Numerical Analysis FAQ. Also (perhaps unexpectedly), I suggest you take a look at the Usenet newsgroup comp.lang.fortran, even if you don’t actually use Fortran. This is one of those happy few Usenet newsgroups with a high signal-to-noise ratio, and listening in on this can be profitable when the conversation turns to general numerical analysis. A similar resource is the JISCmail comp-fortran-90 list.

To support more specialised numerical computing, refer to the libraries section below, 4.2.

#### 2.3.1 Numerical Recipes

Numerical Recipes [?] does not claim to be a numerical analysis textbook, and it makes a point of noting that its authors are (astro-)physicists and engineers rather than analysts, and so share the motivations and impatience of the book’s intended audience. The declared premise of the NR authors is that you will come to grief one way or the other if you use numerical routines you do not understand. They attempt to give you enough mathematical detail that you understand the routines they present, in enough depth that you can diagnose problems when they occur, and make more sophisticated choices about replacements when the NR routines run out of steam. Problems will occur because the routines are not written to be bullet-proof, and if you use them thoughtlessly you can break them without much difficulty. Also, the routines will likely prove inadequate if you have a very demanding application which needs a more efficient or more specialised routine than the ones available here.

That is, the NR library is not filled with magic bullets, and if you try to use its contents as such, the only thing you’ll shoot is your foot. However, NAG and SLATEC don’t supply magic bullets either (though ‘any sufficiently advanced library is indistinguishable from magic to the unsophisticated programmer’, as Arthur C Clarke didn’t quite say). You might use the NR routines successfully for years, but if and when they fail you, you’ll use the more sophisticated substitute all the better because you’ll understand why the simpler original was inadequate.

It is a consequence of the book’s aims that the routines will not necessarily be the most intricately and obscurely efficient. Efficiency matters a lot if you are running some huge hydro code taking months of Cray time, but I would claim that if your code takes less than a week of wall-time to run, then the efficiency gains from using opaque library routines is simply not worth the cost in debugging time. No matter how robustly the library routine is written, you will be able to abuse it — you will manage to break it somehow — and when that happens your only recourse is to work through pages of Fortran IV trying to find the overflowing total, or the case you hadn’t realised was marginal.

If you need very high efficiency, then take a course on numerical analysis and spend time understanding the subtleties of the library routines. Otherwise, read NR carefully (there are sometimes important caveats in the text), customise the routines to your problem, cross check the results you obtain, and keep alert. If you have an aversion to documentation, use NAG or another library: using the NR routines as a black box is a numerical recipe for disaster.

Despite NR’s popularity, it has its critics. These are typically that the NR routines do not use the most efficient modern algorithms, and that they sometimes go wrong in borderline situations. While this may be true (and will be more true of the first edition than the second), it is largely a consequence of NR’s declared intention of being accessible and intelligible. Without making the point explicit, as far as I can see, the NR authors privilege intelligibility over high efficiency, and practicality over unabusable robustness; I don’t believe it’s entirely fair, therefore, to criticise them for not being ultra-efficient and bulletproof. There is a collection of criticisms of the books by W Van Snyder at JPL, at http://math.jpl.nasa.gov/nr/, along with some suggestions for alternatives; there is a rebuttal from the NR authors at http://www.nr.com/bug-rebutt.html.

From the NR home pages, you can browse and print out chapters from the books, but you can’t download the source code. Do remember that the NR code is copyrighted and not free; this may affect your ability to redistribute code which uses it.

#### 2.3.2 Floating point representation

As with numerical analysis, the intricacies of how floating-point numbers are represented, and the quirks of their representation on different platforms, are a maelstrom into which you can fall, dazed, confused and unproductive. However (and again as with numerical analysis), if your codes become elaborate enough, then you are going to have to take the plunge.

Even if you have no plans to venture into the deep, there is a minimum awareness of the problems which can help you write more robust code.

What follows here is rather detailed, but it does, I believe, represent the majority of what you might need to know about floating point representation; there is a good alternative introduction in “Numerical Recipes” [?], and an excellent and detailed introduction in Chapter 2 of Sun’s “Numerical Computation Guide” [?] (appendix E of the same book claims to be ‘What every computer scientist should know about floating point arithmetic’, and is an edited reprint of ?. It makes interesting reading, but it’s probably more than many natural scientists need to know). My account here is not intended to supplant either of these sources. Below, I’ll talk exclusively of IEEE base-2 floating point numbers, as defined in IEEE standard IEEE-754; there are other standards, but they’re now of historical interest only, as just about all modern machines other than VAXes and older Crays use IEEE. Most of what I say concerning accuracy will be about single precision; exactly the same issues arise with double precision, but you can sweep them under the carpet for longer.

##### 2.3.2.1 Endianness of floating-point numbers

The IEEE standard leaves it open to chip manufacturers to decide the order in memory of the four or eight bytes of a floating point number (the same is true of integers, though they’re not part of the IEEE spec). This is denoted by the endian-ness (or ‘bytesex’) of the machine. Alpha and Intel chips are little-endian; Sparcs, the Motorola 68k family used in Macs, and Motorola PowerPC chips, are big-endian (there’s also a ‘PGP-endian’ ordering, but that really isn’t likely to trouble anyone any more). This generally does not matter if you stick to a single platform (other than in the context of programs such as those described in A.1 and A.2), but it makes a difference if you want to share raw data files (see 2.5.2.3) between architectures.

##### 2.3.2.2 Accuracy

The central insight is that you cannot represent an infinite number of reals in a finite number of bits without approximation. It is a consequence of this that the result of any calculation will lose precision when it is represented as a floating-point bit pattern, and that the extent and importance of this loss of precision depends on the operands. Both of these seem obvious by themselves, but the consequences can be sufficiently non-obvious to trip you up.

A non-special IEEE floating-point single-precision longword represents a number

 ${\left(-1\right)}^{s}×1.m×{2}^{e-127}$ (2.1)

where $s$ (the sign), $m$ (the mantissa or significand) and $e$ (the exponent) are base-2 numbers represented in 1, 8 and 23 bits respectively, and $0; the offset 127 is called the bias. These are encoded into the 32 bits of the longword as shown in table 2.1.

Table 2.1: IEEE floating-point representations of numbers. This table is adapted from table 2–3 in ? and figure 1.3.1 in ?, which have fuller discussions of the issues here. For discussion of $b$ and $b-1$ see the text.
 $s$ $e$ $m$ $1=$ 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 $2=$ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 $3=$ 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 $2ϵ=$ 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 $1+2ϵ=$ 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 $b=1.000220=$ 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 1 1 $\left(b-1\right)=2.197266×1{0}^{-4}=$ 0 0 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0

Firstly, it is important to distinguish between the range and the accuracy of floating-point numbers. The smallest and largest positive numbers which can be represented by normal IEEE single-precision numbers are $1.{0}_{2}×{2}^{1-127}\approx 1.175×1{0}^{-38}$ and $1.1\dots {1}_{2}×{2}^{254-127}\approx 3.403×1{0}^{38}$, but this is very different from the accuracy to which these numbers are represented. These extreme values differ from the next representable ones up and down by $1×{2}^{-23}×{2}^{-126}\approx 1.401×1{0}^{-45}$ and $1×{2}^{-23}×{2}^{127}\approx 2.028×1{0}^{31}$ respectively, so that any number which differs from them by less than that amount is unrepresentable. This accuracy limitation is expressed by the machine epsilon, $ϵ={2}^{-24}\approx 5.960×1{0}^{-8}$ (for IEEE single-precision), which is the maximum relative error incurred when a real number is represented in floating-point. It is a consequence of this that $ϵ$ is the smallest number such that $1\oplus 2ϵ\ne 1$, where the operator $\oplus$ represents addition of floating-point numbers. 1

Another way of thinking about this is as follows. When two floating-point numbers are added, the exponent of the smaller one must be increased to match that of the larger, and the mantissa (or significand), $m$, must be shifted rightwards to compensate. If the two numbers differ greatly in magnitude, then the smaller will lose significance in this operation, and if it is less than a factor $2ϵ$ of the larger one, the shift will push the number right off the end of the mantissa, and contribute nothing to the sum. You can easily confirm that ${2}^{24}\oplus 1.0={2}^{24}=16777216.0$.

The immediate practical consequence of this is that if, in the bowels of some calculation, you are adding up many small numbers (doing an inner product for very large vectors, for example), then the final total may be grossly inaccurate if you’ve fallen into this trap. Similarly, if you are subtracting numbers which are almost equal, perhaps in the course of debiasing or removing some DC offset, the subtraction may allow all or most of the leading accurate digits to cancel, leaving the result to be determined by digits possibly heavily contaminated by roundoff. [RW]

Consider the simple calculation $ab-ac$, with $a\approx 1.000244$, $b\approx 1.000220$ and $c\approx 1.000269$. Here, the rounding in the two multiplications conspires to give a difference which has a huge relative error of $2.07×1{0}^{-3}$. We can, however, address the problem by rewriting the calculation in ways which are equivalent for real arithmetic, but inequivalent for machine arithmetic. If we do the subtraction before the multiplication, and calculate $a\otimes \left(b\ominus c\right)$, then we retain a little precision in the subtraction and end up with a relative error of $9.765625×1{0}^{-4}$. Finally, if we instead calculate $a\otimes \left(\left(b-1\right)\ominus \left(c-1\right)\right)$, then we do the subtraction with a full 23 bits of precision (compare 2.1), and lose as little as possible of this in the final multiplication, and end up with a relative error of only $2.532525×1{0}^{-7}$. This is clearly a rather synthetic example, but it is not at all unrealistic, as it is equally clearly closely related to the common problem of calculating the discriminant ${b}^{2}-4ac$ of the quadratic formula, when the quadratic is nearly degenerate.

These problems are demonstrated in the short program fpdemo.c in A.3. Note that the numbers $a=1+1/{2}^{12}$, $b=1+\left(1-\right)/{2}^{12}$ and $c=1+\left(1+\right)/{2}^{12}$ are (a) specifically chosen to demonstrate the effect, and (b) are calculated within the program, rather than being initialised from the decimal representations 1.000244, and so on. If the latter were not true, then the improvement in relative error would disappear, since all precision would have been lost in this initialisation. If you wish to explore the representation of floating-point numbers on your machine, you can do so using the example program fpp.c in A.2.

These problems are unfortunately both easy to run into, and hard to notice if you’re not in the habit of looking for them whilst writing your code. A brute-force way around them is to do sensitive parts of your calculation in double precision. If you are doing the calculation in double precision and still running into the problem, then you will have to rearrange the calculation somehow, to contain the loss of precision. 2 Note, however, that a compiler’s optimizer can frustrate your efforts here, if you’ve given it a sufficiently high optimization level that it starts reorganising your code: in general $\left(a\oplus b\right)\oplus c\ne a\oplus \left(b\oplus c\right)$, and $a\ominus b\oplus \left(b\ominus c\right)\ne a\ominus c$, but a high optimization level may instruct the compiler to ignore this fact, which can be disastrous for the accuracy of your code. Moral: make sure you know what the various optimization levels actually do before you invoke them.

As alluded to above, it is generally possible to sweep accuracy problems under the carpet by using double-precision floats. These occupy eight bytes of storage rather than four, and the analogue of 2.1 is

 ${\left(-1\right)}^{s}×1.m×{2}^{e-1023}$ (2.2)

where $s$, $m$ and $e$ are respectively 1, 52 and 11 bits long, and the bias is 1023. Thus the smallest and largest normal positive double-precision IEEE numbers are $1.{0}_{2}×{2}^{1-1023}\approx 2.225×1{0}^{-308}$ and $1.1\dots {1}_{2}×{2}^{2046-1023}\approx 1.798×1{0}^{308}$, and the machine epsilon for double precision is ${2}^{-53}\approx 1.11×1{0}^{-16}$.

On some platforms such as Solaris and Compaq, with the manufacturer’s compilers, it is possible to use quadruple precision. You should not use double or quadruple precision automatically, however, for a variety of practical and principled reasons.

Firstly, if your program already suffers from rather prodigal use of memory (ie, has very large arrays), then the doubling in the size of real arrays will only make the problem worse and, by probably increasing the amount of swapping, might make your program significantly slower.

Secondly, although there are some issues to do with the relative speed of single- and double-precision calculations, these are probably not significant enough to be worth worrying about in all but the most long-running codes, and in any case would generally be swept up by the compiler (I’d welcome any correction or amplification on this point). 3

Thirdly, if your results change if you switch to double-precision, then there is an accuracy problem in your code, which would probably bear some investigation. If there is some part of your code which genuinely requires the extra precision — say because you have to add numbers of hugely different scale in accumulating a large inner product — then there is not much you can do about it, and that part of your code, at least, must have the extra precision. Failing such an explanation, however, you can regard such behaviour as a miner’s canary, indicating some potential problem with your code, which you might benefit from investigating further.

Fourthly, remember that some algorithms are designed to work with single precision, and will no longer work properly if you search-and-replace double for float or real*8 for real*4. As just one example, one of the Numerical Recipes algorithms (svdcmp) includes a convergence test of, essentially, if (val+err==val). Whatever the merits or otherwise of this as a test, it is a cheap way of establishing whether err has reached machine precision, which will not work in the expected way if val and err are double precision.

For further details on floating point representation, see, for example ?, and informal but very useful discussions (both available on the web) by William Kahan [?] and Jim Demmel [?]. The former is one of the authors of the IEEE-754 standard for floating-point numbers.

##### 2.3.2.3 Other floating-point topics

As well as defining the format in which normal floating-point numbers are stored, the IEEE-754 standard includes the definitions for several other (classes of) special values.

When the exponent $e$ (of a single-precision number) is zero, the longword is taken to represent the number

 ${\left(-1\right)}^{s}×0.m×{2}^{-126}$ (2.3)

(compare 2.1. Unlike the usual floating-point numbers, which have an implied leading 1 in the significand and 23 bits of precision, and which are referred to as ‘normalised’, these have an implied leading 0 and less than 23 bits of precision. These are ‘denormal’ or ‘subnormal’ numbers, and are the result of an underflow, such as dividing the smallest normalised number by two. This behaviour is known as ‘gradual underflow’, and allows the precision in a calculation to degrade gracefully when it underflows. It is distinct from ‘Store-0 underflow’ common before the IEEE standard, in which any expression which underflowed was replaced by zero. This was, and remains, one of the more contentious parts of the IEEE standard. Be warned that older Crays, which use their own floating-point format, have a Store-0 underflow policy, and that the Alpha chip, although it generally implements IEEE floats, has a Store-0 underflow as its default mode, and will neither produce, nor accept as input, denormalised numbers.

If the significand as well as the exponent is zero, the longword represents the number ${\left(-1\right)}^{s}×0$. Note that the IEEE zero is signed, which allows $1/\left(+0\right)$ and $1/\left(-0\right)$ to be positive and negative infinity; this can be important when doing calculations near branch points.

If the exponent is 255 and the significand is zero, the longword represents the value ${\left(-1\right)}^{s}×\infty$. That is, infinity has a special value, distinct from the largest possible normalised number. Positive infinity is generated by, for example $1/0$, or $log0$, where infinity is the mathematically correct, but otherwise unrepresentable, value of a calculation.

If the exponent is 255 and the significand is non-zero, the longword is Not a Number, usually represented by the string ‘NaN’. A NaN is the result of an operation on invalid operands, such as $0/0$ or $log\left(-1\right)$. NaNs have the properties that any operation which has a NaN as an operand has a NaN as its result; and any comparison on a NaN, such as < or ==, evaluates to False, including NaN==NaN. The only exception is that, when x is a NaN, then x != x is true. Why would you want to use such a peculiar number? Generally you don’t, and its appearance in a calculation is an error, which is why processors can be set to dump core when a NaN or an Infinity is produced. However, it can be useful to turn off this behaviour if it is not off by default, and rather than elaborately avoid producing a NaN at each stage, make a check once at the end of a calculation, possibly invoking an alternative (possibly more expensive or special-cased) algorithm if any NaNs are found.

Different compilers handle exceptional values in different ways; see 2.2. Of the three Starlink platforms, only the Alpha traps on exceptional values by default.

Table 2.2: Summary of compiler IEEE traps.
 Compiler traps enabled by default see suppress with Sun cc & f77 none -ftrap Alpha cc overflow, divide-by-zero, invalid operation -fptm, -ieee -ieee Alpha f77 overflow, divide-by-zero, invalid operation -fpe -fpe1 gcc none -mfp-trap-mode (Alpha-gcc)

We have only scratched the surface of a rather intricate topic here. Both Suns and Alphas have extensive f77 and cc man-pages, which hint at the broad range of floating-point options available when compiling and linking your code. See 2.5.2.4 for discussion of the efficiency tradeoffs when using IEEE exceptional values.

### 2.4 Programming languages

I speak Spanish to God, Italian to women, French to men, and German to my horse. Emperor Charles V (attr.)

There is no One True Programming Language, nor even One True Compromise.

The language you use to write code is as much a function of your background, environment and personal tastes, as it is a function of technical merit. At some level or another, all programming languages are equivalent, but just as you’d look askance at anyone who wrote a stellar atmosphere code in TeX 4, it’s clear that some languages are more suited for some tasks than others.

The obvious issue at this point is the choice between Fortran and C. Let me state first of all that, for general scientific computing, I do not believe the difference between the two is substantial enough, or the advantages of each are unqualified enough, that it warrants abandoning the language you’re comfortable with and learning the other.

The languages’ respective strengths only become significant when you are deep within their ‘native territories’: for Fortran this territory is that of numerically intensive codes with week-long runtimes, for C it is intricate manipulation of data structures. Away from these extremes, the choice is at the level of which to choose for a particular application, given that you know both, and an ideal application might be said to be one with a Fortran heart whirring away in a C harness.

Though Fortran and C are both simple languages, Fortran is simple in ways that an optimizing compiler can exploit. This means that for a numerically intensive application, where processing speed is very important, Fortran is the only suitable language.

Why is this? The reason can be illustrated by the two languages’ loop semantics. Fortran’s loop is very simple:

do 10, i=1,n
C  process array element a(i)
10 continue

The equivalent construction in C is

for (i=0; i<n; i++) {
/* process array element a[i] */
}

The difference is what the two language compilers can say about the loop. The Fortran compiler can know that the loop will be performed exactly $n$ times, since neither i nor n may be changed within the loop; also the array a(i) cannot overlap with any other array within the loop body. On the other hand, in C, the loop for (init; test; increment) body  is defined to be equivalent to the loop init; while (test) body; increment; , so that both i and n could change within the loop, and since a could be a pointer, it could point anywhere. The Fortran compiler could rewrite the first loop as

do 10, i=1,n,4
C     process array element a(i)
C     process array element a(i+1)
C     process array element a(i+2)
C     process array element a(i+3)
10 continue

cutting the number end-of-loop tests by a factor of four (this is a simple example of ‘loop unrolling’, and presumes that $n$ is divisible by 4). The loop semantics could also make the elements of the loop candidates for parallelization, if the compiler and hardware support that 5 The C compiler has to do a lot more investigative work before it can make any similar rearrangements.

Even if there were no such fundamental differences, the fact would remain that Fortran vendors have always sold their products on the strength of their optimizers, so that it is both easier and profitable for Fortran compilers to produce very fast code.

Though C’s loop semantics are troublesome, it is really the pointer type which causes a lot of the trouble — since a pointer is little more than an address, the compiler can have little clue what the programmer is aiming to do with it, and so has little choice but to make a literal translation of the C code into assembler.

The pointer type is redeemed, however, because it is this which allows C to code complicated data structures so naturally: not only aggregate types, but linked lists, queues, and all the other ways of holding, reading, parsing and generally juggling data, so beloved of Computer Science 1.

Fortran 90/95 goes some way towards addressing Fortran’s weaknesses by introducing pointers (though the fact that they’re living in a Fortran code doesn’t make them any less troublesome to an optimizer), as well as proper data structures and some modularisation. At the same time as Fortran does that, however, C++ uses object-orientation to further enhance C’s advantages in data manipulation, going so far as to make functions auxiliary features of data structures.

For general discussions of the issues involved in high-performance computing, see the book “High Performance Computing” [?].

Starlink ensures that C, C++, Fortran 77 and Fortran 90/95 compilers are available at all sites.

#### 2.4.1 Fortran 77

Fortran 77 is probably the dominant current Fortran dialect, replacing earlier standards such as Fortran IV and Fortran 666. There are several dialects of Fortran, containing various vendors’ language extensions, but the only extensions which are usually portable are those in VAX Fortran, which includes the enddo statement for terminating do-loops, and the %val() function which is necessary to intermix Fortran and C code (see 2.5.4).

The standards document for Fortran 77 is not exactly bedtime reading, but can be quite useful when you have forgotten details of syntax, especially to make sure what you are writing is correct rather than just allowed by the compiler you are using at the time. As an introduction to Fortarn 77, I’ve heard good things about ?. [MBT]

An important deficiency in Fortran is its lack of any standard way of dynamically allocating memory (as opposed to allocating arrays to be a fixed size at compile time). The Starlink CNF library, SUN/209 is intended to make this reasonably easy, and includes a brief discussion of the underlying mechanism. This is rather a can of worms, but the essential technique is to write a bit of C which obtains a pointer to a block of memory via a call malloc, return that pointer to the Fortran program as an integer, then use %VAL() to supply that integer as an argument to a function which is expecting an array. This is non-standard (and not likely to become standard, now that Fortran 90 includes its own mechanisms for dynamic memory allocation), but it is a well-established technique and therefore probably more portable than you have any right to expect.

There is a large number of introductions to Fortran, but not, I believe, a single preeminent one. The “Starlink application programming standard”, SGP/16, is a collection of programming style guidelines for Fortran, and includes further references.

There is a reasonable amount of online information on Fortran, which is well-covered at the ‘Fortran Market’ (http://www.fortran.com/fortran/info.html), which includes several Fortran FAQs.

#### 2.4.2 Fortran 90/95

Almost immediately after Fortran77 was standardised, work began on its successor. Although this project was named Fortran 8X, the work took long enough that the final standard was named Fortran 90.

The Fortran standard is maintained by ISO committee JTC1/SC22/WG5, and the current version of the standard is ISO/IEC 1539-1 : 1997.

Fortran 90 was an attempt to respond to the numerous developments in language design, seen since Fortran’s last standardisation in the sixties and seventies. In contrast to Fortran 77, which aimed to standardise existing practice, Fortran 90 was an attempt to push forward the development of Fortran as a language. A summary of the new features (adapted from ?) is:

• Free format source code form (no more column-counting).
• A means for the language to evolve by labelling some features as ‘obsolescent’.
• Array operations (for example, X(1:N)=R(1:N)*COS(A(1:N))).
• Pointers.
• Improved facilities for numerical computation including a set of numeric enquiry functions.
• User-defined derived data types composed of arbitrary data structures and operations upon those structures.
• Facilities for defining collections called ‘modules’, useful for global data definitions and for procedure libraries. These support a safe method of encapsulating derived data types. Operator overloading and prototyping.
• Requirements on a compiler to detect the use of constructs that do not conform to the syntax of the language or are obsolescent.
• New control constructs such as the select case construct, an exit and a new form of the do.
• The ability to write internal prodedures and recursive procedures, and to call procedures with optional and keyword arguments.
• Dynamic storage (automatic arrays, allocatable arrays, and pointers).
• Improvements to the input-output facilities, including handling partial records and a standardized namelist facility.
• Many new intrinsic procedures, (date, precision, arrays, ...)

Fortran 90 is backwards compatible with Fortran 77, so that every strictly conformant Fortran 77 program is also a valid Fortran 90 program. However, many of Fortran 77’s odder features are strongly deprecated, and may disappear in the next revision of Fortran, due to appear sometime in the next decade. Fortran 95 is a minor revision of Fortran 90.

There is an increasing number of books on Fortran 90/95; for a booklist, refer to the Fortran Market at the URL above. I am, of course, reluctant to recommend books I have not used myself, so I will simply note that I have heard good things about ?, and that the first author was prominent in the negotiations concerning the development of the Fortran 90 standard.

If you plan to use Fortran 90/95, but avoid the deprecated parts of Fortran 77 altogether, you might be interested in F, which is a subset of Fortran 90 with all the deprecated features removed. You can obtain an F compiler from Imagine1, including a free educational version for Linux.

Several Fortran compilers were reviewed in a short report of January 1997 by the high performance computing project at Liverpool.

It is occasionally necessary to produce code in a mixture of Fortran and C. See 2.5.4.

#### 2.4.3 C

There are hundreds of books on C, but the only essential one is the Kernighan and Ritchie book [?] (also known as just ‘K&R’). This is a very short book, by the authors of the language, which combines a tutorial introduction with a detailed reference to the language and its standard libraries. I believe it’s the only C book you’ll ever need.

The book’s compression makes it possible, and even advisable, to read it from beginning to end. It avoids the bloat found in many other C books by not teaching you how to program, by never saying anything twice, and by never attempting the empty reassurance that ‘it’s all easy, really’; its examples often illustrate more than one point at a time, and culminate in a sample implementation of the library function malloc. Science textbooks don’t talk down to you; I don’t see why computer texts should be excused for doing so.

It follows from this that K&R is not necessarily the ideal recommendation for someone new to programming, who might need something a little more comprehensive. My recommendation in that case is to find an introductory C (or even general programming) book which covers what you want without irritating you, and use that up to the point where you feel comfortable with K&R.

C allows the programmer a considerable degree of freedom in how an algorithm is expressed. The fact that this freedom is easily abused makes style guides more common for C than for other languages. Starlink has adopted “The Elements of C Programming Style” by Jay Renade and Alan Nash as its principal C programming standard, as well as producing a compact collection of programming style suggestions in SGP/4.

One way of making your life easier in this respect is to be consistent about using current standard ANSI-C. See 2.5.5.

The C language as originally designed by Kernighan and Ritchie was standardised as ANSI-C in 1990 (K&R 2nd edition describes the latter). This standard is currently being revised by an ISO working group with the memorable name ISO/IEC JTC1/SC22/WG14 — C. One of the motivations to the work is to provide better support for floating point calculations in C, such as defining a standard interface to the IEEE floating point exceptions.

The excellent C FAQ contains substantially more than you realised you wanted to know about C. It contains detailed discussions of both common and arcane problems.

#### 2.4.4 C++ and object orientation

C++ is an object-oriented version of C, and was finally standardised in 1997. Compiler makers have tracked the developing standards, so that recent C++ compilers should conform pretty closely to the standard even if they formally pre-date it.

Object-orientation is the notion that functions are attached to data, rather than simply operating on them. For example, if a C program were to declare a data type Array, and initialise a variable of that type with Array a; then one can imagine a function to return the determinant of the array, declared as float determinant (Array a);. In C++, the ‘function’ to obtain the determinant could be declared as part of the data type, and obtained via the expression a.determinant(). There are two main points to this. Firstly, the so-called ‘member function’ determinant can have privileged access to the internal representation of the data type Array, so that other parts of the program need not, and may not, manipulate that representation, erroneously or otherwise. Secondly, and consequently, the representation and matching member functions can be changed with the guarantee that the rest of the program will be unable to tell the difference. This is characterised in the remark that new programs have always been able to use old code (in libraries), but object-oriented approaches mean that old programs can use new code (when an implementation changes, while the interface remains the same).

Of course, both of these points are to some extent true of traditional programming languages — indeed I’d doubt that there’s anything you can do in C++ which you couldn’t do with some ingenuity in C — but the point is that they are much easier, and much more natural, in C++.

It may be clear at this point that C++ is not a beginner’s language. The result of grafting a high-level abstraction onto portable assembler is a language with far more syntax than is healthy. Do not feel you need to learn C before C++ — they are closely enough related that the differences can be confusing. My feeling, however, is that you should consider learning Java (2.4.5) first if you have the time: there is relatively little that needs to be unlearned going from Java to C++, and Java’s simplicity makes it easier to come to grips with object-orientation, without drowning in a sea of punctuation.

The excellent C++ FAQ includes book recommendations. The canonical book for C++, with the same status K&R has for C, is “The C++ Programming Language” [?], by the language’s author. However, I find Stroustrup’s book rather irritating: it’s rather badly organised, and the index is dreadful.

Note, by the way, that the C++ compiler on Suns is named CC, on Alphas named cxx, and on Linux named both c++ and g++.

#### 2.4.5 Java

Java was developed by Sun and at present (mid-2002) eclipsed only by XML as the current Big Thing. Source code is compiled to machine-independent bytecode, which is then either interpreted, or compiled to machine code on the fly, by a Java Virtual Machine (JVM) on a particular platform. The main interest to astronomy might be in developing machine- and architecture-independent interfaces either to codes or archives. It’s also, I think, of use as a stepping-stone to C++, since its object-oriented features are less obscured by syntax than they are in C++. A very good textbook, written (you won’t be surprised to guess) by two of the language’s authors, is “The Java Programming Language” [?]; once you’ve mastered the basics, there’s a good deal of helpful advice in “Effective Java” [?]. There are numerous resources at Sun’s Java site at http://java.sun.com.

Java is a mixture of an interpreted and a compiled language. Java source code is compiled by the Java compiler into bytecodes, which are then interpreted by a Java Virtual Machine. These bytecodes are low-level, but architecture-independent so that, in principle, only the virtual machine and its runtime need to be ported to a new machine, whereas the code should be completely portable. This does not work quite as well in practice as it does in principle, but it does mean that Java is not too far off the ideal of run-anywhere code.

Since Java bytecodes are ultimately interpreted, Java programs have the potential to be rather inefficient. However, just-in-time compilers (JIT), which analyse running code and optimize the most heavily-used parts, and similar developments, should help the situation improve in the future. This is possible because JITs have access to both the program source code and the running program, and so can combine the features of both an optimizer and a profiler, and thus optimize more aggressively than a traditional compiler could. This, combined with Java’s good built-in support for different platforms and for resource-discovery, means that, possibly somewhat surprisingly, Java has been suggested as a suitable language for high-end codes. The Java Grande Forum is concerned with investigating and promoting developments in this area.

Note that it is relatively easy to write slow Java (for example, adding Strings is a lot slower than using a StringBuffer; input and output streams are not (extensively) buffered, so if you have a lot of I/O, you would be well advised to use BufferedInput,OutputStream objects wrapping your I/O objects; small-object creation is extensively and increasingly optimised, but it’s still not dirt-cheap, so look out for that in a small loop). That means that using a profiler (see 2.5.1) is particularly important. However, with Java, that’s easy, because there’s one built in to the JVM. Give the option -Xrunhprof:help to find out how to invoke it. For example:

java -Xrunhprof:cpu=samples,file=myprog.hprof myclass

There’s quite a lot of information in here (and the file format is under development), but as with all profilers, you’ll see a list of the number of times various methods were called: the ones at the top of the list are the ones where the program spent most of its time, so work out why, and concentrate on making them faster. Ignore the rest.

Java is still developing. At present (mid-2002) Java 1.3.1 is long in the tooth but stable. Version 1.4 is in beta, and on the point of being released properly; it includes a few changes to the language, most noticeably the inclusion of an assert construct.

#### 2.4.6 Other languages

C and Fortran cover most of the bases for scientific computing, but there are one or two others which come in useful occasionally.

##### 2.4.6.1 awk and sed

Often, you can find yourself performing some repetitive editing task, for example massaging data into a form which a program can conveniently read. Such tasks can conveniently, and reliably, be done by programs such as awk and sed. Neither of these utilities is as well-known as it should be, as they can save a great deal of tedious and error-prone effort.

sed is a version of the very simple editor ed, which is specialised for performing edits on a stream of text. For example, the following rather elaborate sed script prints all the section headings from a LaTeX document:

sed -n ’s/^\\$$sub$$*section{$$.*$$}.*\$/\2/p’ sc13.tex This may look like gibberish, but it is simpler than it looks. The option -n instructs sed not to print out input lines, which it does by default. The sed expression in quotes calls the s command: whenever the ‘regular expression’ between the first pair of slashes matches, the s command replaces it with the expression between the second pair and, because the s command is suffixed with a p, prints out the modified line. The regular expression matches lines which start with a backslash, have zero or more occurrences of the string ‘sub’, which is followed by the string ‘section’, then any sequence of characters, followed by a  then any characters, ending at the end of the line. The caret ̂ matches the beginning of a line, the backslash is a special character, so that it must be ‘escaped’ by prefixing it with another backslash, \\, the grouping operators are $$ and $$, the asterisk indicates that the previous (bracketed) expression may be present zero or more times, the dot matches any character, and the dollar matches the end of the line. As well as grouping, the parentheses save what they match, and the expression \2 in the replacement text refers to what the second pair of parentheses matched, namely the text between the curly braces. The overall result is that the matched string (the whole of the line) is replaced by the contents of the braces and then, because of the p suffix, printed. Another useful tool is awk, named after its designers Aho, Weinberger and Kernighan. Like sed, it works through a text file, executing scraps of code whenever a line matches some condition. Before it does anything with a line, it breaks it into fields, separated by whitespace by default. Consider the following example7 ps u | sed 1d | \ awk ’{print$4, $0; totmem+=$4}; END {printf "total memory: %f\n", totmem}’

This (not terribly useful) line generates a process listing using ps, uses sed to delete the first line (that is, it executes the command d on line number 1), and then passes the result through the awk program contained in quotes. On every line, this prints field number 4 (the %MEM column in the listing) and field 0 (which is awk-speak for the whole input line), and adds the value of the fourth field to a running total; on the line matching the pattern END — that is, the pseudo-line at the end of the file — awk prints out the accumulated total.

You won’t typically generate expressions as complicated as these on the fly (at least, not until you get really good). This example is intended to suggest that you can, in aliases or in scripts, perform quite complicated transformations of text files. For further details you could look at the sed or awk man-pages, which are complete but very compressed, or work through a tutorial in your system’s printed documentation. There are several guides to sed and awk, but you might be best off, initially, using an advanced introduction to Unix, such as ? or ?. The canonical documentation for regular expressions is on the ed(1) manual page.

##### 2.4.6.2 Perl

Perl is a general-purpose scripting language. It started off as a text-reformatting facility, rather like a super-awk, but it has now grown to the point where it really is a programming language in its own right, capable of supporting quite substantial projects. Perl programmers can call on a huge range of supporting code, collected at the Comprehensive Perl Archive Network, CPAN, to do everything from internet programming to database access. Perl’s expressive power makes it ideal for rapid development of all sorts of complex systems — some huge proportion of the web’s CGI scripts, for example, are written in Perl. Unfortunately, the flexibility of Perl’s syntax make it quite possible to write spaghetti, the like of which we have not seen since Fortran IV dropped out of fashion.

The Perl manual pages are reasonably clear. O’Reilly publishes a good book on Perl, written by Larry Wall, its author [?]. This is a good reference book, but ? is possibly a better tutorial introduction. Perl regular expressions are slightly different from the ones used by sed and friends — see the perlre manual page.

Perl is a semi-interpreted language. Somewhat like Java, when the Perl interpreter first processes your Perl script, it compiles it to an internal code, which it then proceeds to interpret. This means that Perl programs have a relatively long startup time, but run reasonably efficiently after that. This is not a big issue in most applications.

The current (end-2001) version of Perl is 5.6 or thereabouts. Perl 6 will be a significant step in the evolution of the language: it’s in the offing, but still some way away.

### 2.5 Code topics

Several of the topics mentioned here are discussed at greater length in the Sun Fortran User’s Guide [?], which is of interest even if you program only in C.

#### 2.5.1 Profiling

Before you start doing any optimization at all, check which parts of your code are slowing the machine down — there’s no point in tweaking, for example, a one-time initialisation routine which takes only a tiny fraction of the program’s runtime. You do this by using a profiler.

Different compilers will invoke a profiler (presuming they have one) in different ways. The Sun and Digital Fortran compilers include profiling code if you give the option -pg to the f77 command. Compile and link the program with this option (if you wish, you can compile only those modules you want to profile with the -pg option, but you must include the option in the final link command), and then run it. This run will create a file gmon.out in your current directory. Then you can run gprof. Taking as example the programs in A.4, we can build and profile them as follows:

% f77 -pg -o timewaster p1.f p2.f p3.f
p1.f:
MAIN silly:
p2.f:
mkidentity:
p3.f:
determinant:
Linking:
% timewaster
1.00000
% ls
gmon.out        p1.o            p2.o            p3.o
p1.f            p2.f            p3.f            timewaster*
% gprof timewaster > timewaster.gprof

The output file timewaster.gprof is many lines long, even though the program is so short! The file contains a great deal of information, but buried amongst it is the following display (from the Sun profiler, trimmed):

called/total       parents
index  %time    self descendents  called+self    name     index
called/total       children

0.05        0.64       1/1           main [2]
[1]     97.2    0.05        0.64       1         MAIN_ [1]
0.64        0.00  100000/100000      mkidentity_ [4]
0.00        0.00       1/1           __s_wsle_nv [167]
0.00        0.00       1/1           determinant_ [8]
0.00        0.00       1/1           __do_l_out [157]
0.00        0.00       1/1           __e_wsle [158]
-----------------------------------------------
0.64        0.00  100000/100000      MAIN_ [1]
4]     90.1    0.64        0.00  100000         mkidentity_ [4]

This indicates that the subroutine mkidentity was called 100000 times from the main program, and that 90.1% of the time — a total of 0.64 seconds — was spent in that routine. Were this a real project, this would indicate that this routine would be a good one to examine for speedups.

This example, and a further one using the tcov utility, were taken from ?, which might be consulted for further details. pixie is the corresponding utility on Compaqs — see its man-page for details [RW].

gprof is not specific to Sun, but is available for other architectures, and other compilers (including gcc) as well.

#### 2.5.2 Optimization

Just say no! Or, if you need authority:

Donald Knuth, in “Literate Programming”

Premature optimization is the root of all evil.

The first thing to ask yourself, when you are considering performance enhancements on your code is ‘do I really need to do this?’. You should only attempt optimizations once you have either identified a particular part of your code which needs improving, or else you know that you can write the efficient version of the code correctly. If the ‘efficient’ version is unnatural, then the savings in runtime you achieve by reordering the code have a good chance of being wiped out by the time spent debugging a convoluted juggling act. Within reason, prefer simplicity and robustness to efficiency8 — get your code working correctly and believably, and only then start worrying about speed. That way, if you later establish that you need to work on part of the code, you at least have a trusted set of results to check the new version’s against. Another way of putting this is that it’s easier to optimize correct code than it is to correct optimized code.

If you’re writing a code which will run for days or more, then it might be worthwhile investing a significant effort in tuning your code for performance. Doing this properly might involve you learning more about your machine’s architecture than you might enjoy, and as as result will make your program less portable to the next generation of whizz-bang hardware, or even the next compiler.

It is generally worthwhile to use hand-optimized routines, if they are available for your compiler. Where this can pay particularly good dividends is in the case of linear algebra, where the speed of a calculation can be significantly improved (‘a factor of several’) by routines carefully written to access memory in an optimal way. There is a standard interface to such routines called the BLAS (Basic Linear Algebra Subroutines), which your compiler documentation may mention. Although it won’t be as fast as a machine-specific implementation, the free BLAS implementation at Netlib (UK mirror) should speed your codes up significantly. [MBT, RW]

The first lesson to learn about optimization is that the compiler can probably do it better than you can.

Most compilers will have a -On option, to allow you to set how aggressive the compiler’s optimization will be. The GNU gcc compiler, for example, allows optimization levels 1, 2 and 3, and Solaris Fortran has five levels, which progressively enable more techniques for improving your program’s execution speed. Compaq compilers have a ‘-tune host’ option, to produce code optimized for a particular machine. [MBT]

If your program starts behaving oddly, at some point you should start worrying about errant optimization, if you have that turned on. Optimization works by the compiler recognising patterns in your code, and possibly rearranging the output assembly language to take advantage of these. If it gets this wrong (because there’s a bug), if your program relies on a particular floating-point behaviour, or if you’re doing something sufficiently weird with your programming language that you manage to confuse the compiler, then you could trick it into doing the wrong thing.

Specifically, if you have occasion to worry about the order in which expressions are evaluated — for example to conserve precision, as described in 2.3.2.2 — then you should be very wary of optimization, as one of the first things an optimizer might do is to reorder expressions which are associative mathematically but not computationally. If you have some code like this, it might be best to isolate a single routine in a module of its own and compile it separately with its own appropriate optimization level.

Parallelization is another type of optimization. Just say no several times to this, but if you have an application which really needs it, then buy a book and some aspirins. See 2.5.2.7.

Although you should generally leave optimization to the compiler, there are some things you can do to at least avoid getting in its way.

##### 2.5.2.1 Avoid using the register keyword in C

This is, historically, supposed to suggest to a compiler that a particular variable might be better stored in a register than in main memory, though the actual semantics defined in the C standard is that the keyword is an assertion that the program nowhere takes the address of the variable so qualified. Such an assertion means, amongst other things, that the variable will never be visible from any other functions, which might trigger other optimizations. [SG]

You do not need to use this keyword to actually suggest to the compiler that it store the variable in a register, since the compiler might well do this anyway, and have a better idea of which (non-obvious, or temporary) variables to choose than you have.

##### 2.5.2.2 Walk through arrays in the correct order

A multi-dimensional array is necessarily stored in memory as a linear array. If you have to process this entire array, it will be fastest if you try to do so in the order in which the array is stored. Fortran arrays are stored with the leftmost index varying fastest. That is, the Fortran array integer i(2,2) will be stored in memory in the order i(1,1), i(2,1), i(1,2) and i(2,2)

Take, for example the Fortran array integer ia(1000,1000). If you run through this array with the first index changing fastest, as in the following fragment

do 10, j=1,1000
do 20, i=1,1000
call some_calc (ia(i,j))
20   continue
10 continue

then you will move through the array in the ‘natural’ order. If, however, you process it in the other order, with the second index changing fastest, then when you move from ia(i,j) to ia(i,j+1), the computer will have to jump 1000 places forward in the stored array, each time round the loop. This would not matter too much if the entire array were stored in physical memory, but this will not be the case for any sizable matrix, so that to find the appropriate array element, the machine may have to reload part of its memory from a saved version on disk. Such input is relatively slow, and if the calculation is laid out in such a way that this happens repeatedly in an inner loop, which is executed many times by an outer loop, then there can be a significant impact on your program’s performance.

This system of swapping memory back and forth to disk is known as ‘virtual memory’, and the machine’s discovery that the memory it needs is swapped out is known as a ‘page fault’, and is one of the important statistics about the run of your program. You can obtain such statistics in copious detail by using the compiler’s profiler (see 2.5.1), or quickly by using the time command (see your system’s time man-page for details — not all versions provide this information by default).

Of course, it’s not really as simple as just that. Even when the entire array is stored in physical memory, or when you have rearranged your program to minimise the number of page faults, it can be important not to hop about in arrays too much. Modern architectures will typically have a small amount (modern PCs around half a Mb, biggish servers around 8 Mb) of memory called ‘cache’ which has much faster access time (typically an order of magnitude or more) than the bulk of the RAM. The relationship between this and core RAM is strongly analogous to the relationship between core RAM and disk, in that cache-lines (which are like memory pages, but typically only a few words long) get swapped in and out of it when one word from the line is required. The upshot of that is that it’s important to avoid hopping about over distances much smaller than an I/O page since, even if you have no page faults, cache faults make a huge difference. And of course it’s not as simple as that either — sometimes there are two levels of cache, some architectures implement some sort of prefetching, and so on. Except with some rather sophisticated profilers (which generally have to be supported by a processor architecture which keeps a record of these things) it’s not really possible to see how many cache faults you’re getting, except by writing the code more efficiently and seeing what the difference is. [MBT]

Unlike Fortran, C arrays are stored with the rightmost index increasing fastest, so that the array int i[2][2] will be stored as i[0][0], i[0][1], i[1][0] and i[1][1].

##### 2.5.2.3 I/O

Input and output are slow, and the faster processors become, the bigger is the cost of I/O relative to CPU cycles. That means you should think twice before writing out intermediate results, for reuse later in the same calculation or another. For example, it’s quite easy to fall into the trap of ‘saving time’ by reading in a precalculated table of values, without realising that it can take more time to read the file than it would take to recalculate the table from scratch each time. Remember that computers don’t get bored, and don’t mind doing the same calculation repeatedly.

If you do need to save values for later use, and don’t mind doing it only semi-portably, you can save time and precision by using raw I/O rather than formatted ASCII [RW]. A C example is:

FILE *ofile; float arr[10];

/* set arr[i] to have sensible values */

ofile = fopen ("output-c.dat", "w");
fwrite ((void*)arr, sizeof(arr[0]), 10, ofile);
fclose (ofile);

Note that we carefully cast the variable arr to type void*, and that we could replace this by (void*)&arr[0] if we thought it was clearer. Note also the minor trick of using sizeof(arr[0]) rather than the more obvious sizeof(float); they are equivalent, but the first will remain correct even if we change our mind about the size of the elements of the array arr. You would read the contents of this file in using the function fread.

The fwrite and fread functions are not portable in general, since they deal with floating point numbers merely as bundles of bits, and pays no attention to how these are interpreted. This means that such files are not portable between machines which interpret these bits differently, so that files written on a little-endian machine (see 2.3.2) will be gibberish if read on a big-endian machine, and vice-versa. However, C’s raw I/O is semi-portable, inasmuch as the fwrite function above does nothing other than copy $10×4$ bytes (in this case) from memory to disk; fread similarly copies bytes from disk to memory without any interpretation.

The corresponding Fortran example is

real arr(10)

C     set arr() to have sensible values
open (unit=99,file=’output-f.dat’,form=’unformatted’)
write (99) arr
close (99)

Note that Fortran unformatted output is not portable in any way. Unlike C raw I/O, the Fortran compiler is free to store the information on disk in any way it likes, and the resulting file will not, in general, have $10×4$ bytes in it. That is Fortran unformatted I/O is machine- and compiler-specific, and the file output-f.dat in this example will only be readable in general using a Fortran program built using the same compiler.

##### 2.5.2.4 Use NaN and Infinity

As described in 2.3.2.3, any operation which has a NaN as input, produces a NaN as its result. Similarly, the special values positive and negative Infinity are legitimate numbers which can appear in a calculation. Consider the following code fragment:

do 10, i=-5000,5000
do 20, j=-5000,5000
if (j .ne. 0) then
t(i,j) = real(i)/real(j)
else
t(i,j) = 0.0
endif
20    continue
10 continue

The if test is to avoid a division-by-zero error. If this loop were buried deep within a hierarchy of other loops, then the test could be the source of a significant fraction of the runtime. It can, however be omitted, allowing the matrix t(i,j) to include numerous Infinitys and one NaN (due to 0/0). Because both of these values are permissable floating-point operands, they can be allowed to percolate through the rest of the calculation without elaborate further tests, to be checked only at the end. This is possible only if the floating-point environment is set up not to generate signals on floating-point exceptions (see also 2.3.2.3).

Note that calculations involving the exceptional values tend to run more slowly than those using normal values, so if your calculation produces a significant number of exceptional values — like the artificial example above — a switch to IEEE semantics might not produce a speed win overall. Note also that the expression (a.lt.b) is not equivalent to .not.(a.gt.b), since both a.lt.b and a.gt.b are false when one of the variables is an IEEE exceptional value: this could produce hard-to-find bugs if you were not alert to this when you were writing the code. [SG]

##### 2.5.2.5 Remove debugging options

It may or may not be obvious that debugging and profiling options, as discussed in 2.5.3 and 2.5.1, will both slow your program down (especially in the latter case), and make it bigger. You should compile your program, or at least the numerically intensive parts of it, without them when you are not debugging it. [MBT]

##### 2.5.2.6 Choose a fast compiler

Random points:

• This is a case where free software is not necessarily better. Both Sun and Compaq expend resources on making their compilers as efficient as possible. If speed is important to your application, you’re probably best off using one of these compilers.
• On the Alpha, it’s worth while using the Fortran 90 compiler even for Fortran 77 code [SG].
##### 2.5.2.7 Further reading

Take a look at the Sun Numerical Computation Guide [?], particularly the end of chapter 5, and at the collection of papers at http://sunsite.doc.ic.ac.uk/sun/Papers/SunPerfOv+references/, particularly the helpful paper about compiler switches: you_and_your_compiler (this is a generally useful collection of Sun papers, by the way).

If you do need to spend significant effort tuning your code, then you should consult a textbook on the matter. A good one is “High Performance Computing” [?]. This reference touches on parallelizing your code, a topic which I have deemed sufficiently arcane not to be discussed at all in this introductory guide.

#### 2.5.3 Debugging

One approach to debugging is the brute-force method: simply scatter tracing statements through your code and watch them fill your screen. There are some types of debugging for which this is ideal — for example, graphing a trace of intermediate values may reveal that an algorithm is showing signs of instability — but it can be very cumbersome.

Better, in many cases, is to use a debugger. A debugger allows you to roam through your code, stopping in troublesome functions, examining data, and stepping through your code line-by-line. There is a great deal you can do with a debugger, but they are not often the easiest tools to master. However, there is a good deal you can do armed only with experience of the foothills of the debugger’s learning curve.

I will describe the Sun debugger dbx, here. The use of the GNU debugger, gdb, and the dbx debugger on Digital Unix, are broadly similar.

First, compile crash.c, listed in A.5, in the usual way (cc -o crash crash.c), run it, and watch it crash when it tries to dereference a zero pointer. This leaves a core file in the current directory9. You can obtain rudimentary post-mortem information from this core file with dbx, as follows:

% dbx crash core
[...]
program terminated by signal SEGV (no mapping at the fault address)
(dbx) where
=>[1] bananas(0x0, 0x1, 0xef691338, 0x10074, 0x2, 0xeffff8d0), at 0x10870
[2] main(0x1, 0xeffff94c, 0xeffff954, 0x20800, 0x1, 0x0), at 0x108fc
(dbx) quit

The where command to dbx shows where the program was when it crashed (the [...] shows where I have omitted some unenlightening chatter from the debugger).

The other information dbx gives you is less useful — the debugger doesn’t know enough about your program to be more helpful. You can, however, tell the compiler to pass on more information to the debugger when it processes your code; do this with the -g flag to the compiler, as in cc -g -o crash crash.c. If you compile several modules to produce your executable, you’ll need to give this option to the compilation of each module you wish to debug. You’ll also have to give this option at the link stage on SunOS4, but not on Solaris or for gdb on Linux.

If we run crash again now, and look at the core file, we see that the debugger can provide us with more information.

% dbx crash core
[...]
program terminated by signal SEGV (no mapping at the fault address)
Current function is bananas
12           printf ("banana split: %d\n", *zp);
(dbx) where
=>[1] bananas(i = 0), line 12 in "crash.c"
[2] main(), line 22 in "crash.c"
(dbx) print i
i = 0
(dbx) print buf
buf = "banana number 0
"
(dbx) print *zp
dbx: reference through nil pointer
(dbx) print zp
zp = (nil)
(dbx) quit

Note that we can examine the values of variables in scope at the point where execution stopped, and that dbx knows enough about C (or Fortran) to know the type of variables it is asked to print. Printing the value of the pointer zp shows us why our program has crashed.

The debugger is not only useful for such post-mortem diagnosis. It is also (for some people, primarily) used for investigating a program’s behaviour when it is running normally, albeit with bugs we wish to track down. For example, start the debugger and tell it to stop when it enters the function bananas:

% dbx crash
[...]
(dbx) stop in bananas
(2) stop in bananas

You can set multiple breakpoints (so called) not only in functions, but also at line numbers, or linenumbers within files (using (dbx) stop at crash.c:12). Then tell the program to start running:

(dbx) run
Running: crash
(process id 29707)
Entering the bananas function: 1
stopped in bananas at line 6 in file "crash.c"
6       sprintf (buf, "banana number %d\n", i);
(dbx) next
stopped in bananas at line 7 in file "crash.c"
7       if (i != 0)
(dbx) print i
i = 1

Note that, as well as output from the debugger itself, we also see output from the running program. If the program required input from the user, it would receive it as normal. The program stops at the first line of code within the bananas function. We can move one line forward in the code, verify that the parameter i has its expected value, then tell the program to resume execution.

(dbx) cont
banana number 1
No bananas left!
stopped in bananas at line 6 in file "crash.c"
6       sprintf (buf, "banana number %d\n", i);
(dbx) print i
i = 0
(dbx) cont
signal SEGV (no mapping at the fault address) in bananas
at line 12 in file "crash.c"
12           printf ("banana split: %d\n", *zp);
(dbx) quit

The Solaris debugger can be used through a GUI: give the command debugger to start this up. The Digital GUI debugger is ladebug. There is a GUI interface to gdb, called xxgdb.

In general, you can’t debug optimized code (that is, code compiled with the option -O), because the optimizer may rearrange lines or remove redundant variables. However, both dbx and gdb do have support for debugging code with mild optimization, though there are some things, such as stepping from line to line, you might not be able to do.

This introduction has merely scratched the surface of what you can do with the debugger, by showing you the bare minimum of commands you need to navigate around your running program. There is a great deal more information available in Sun’s debugging manual [?], or in gdb’s info pages (info gdb).

#### 2.5.4 Intermixing Fortran and C

Because there are so many reliable subroutine libraries written in Fortran, you will sometimes need to call a Fortran routine from C. Likewise, you might need to call a low-level C routine from a Fortran program.

For a detailed overview of such ‘mixed language programming’, see SUN/209, “CNF and F77 Mixed Language Programming”, which gives a detailed introduction to calling each language from the other, as well as a set of C macros to help support this. I will not duplicate the contents of that guide, but instead give a very compressed introduction to the problem. This might be enough to get you going. There is a further discussion of the issues, and a compressed description of the solutions, at the Cambridge High Performance Computing Facility (HPCF), at http://www.hpcf.cam.ac.uk/mixed.html. For a Sun-specific discussion, see Chapter 12 of ?.

The biggest difference between C and Fortran is that ‘C is call-by-value, Fortran is call-by-reference’. What that means is that when a C function is called, it receives the values of its arguments, so that any changes to them disappear when the function finishes, but when a Fortran function is called, it receives a reference to its arguments, so they can be altered easily within the function. The consequence of this is that when you call a Fortran function from C, you should pass arguments using C’s address-of operator,&, and when you call a C function from Fortran, you will typically need to pass it the value of the variable using the %val() function (this is a non-standard VAX Fortran extension, but one which is now so ubiquitous that it’s safe to use). These remarks apply to unstructured types such as characters, integers and floats — arrays and strings present other problems, as described below. It follows from what I’ve said that if a C function is declared as void subr (int *p), it’s expecting (the value of) a pointer to an integer, so that this could be called in the ‘normal’ way from fortran: call subr (ia), where ia is an integer variable.

See A.6 for example programs.

##### 2.5.4.1 Arrays

Fortran’s arrays are simple: an array of any dimension is just a list of locations in memory, stored in the order a(1,1), a(2,1), and so on (see 2.5.2.2); when a Fortran function is given an array argument, what it actually receives is a pointer to the ‘top-left’ element. If you’re calling Fortran from C, you simply have to be aware of the switch on order, and then pass &a[0][0].

If you have a C routine with a one-dimensional array argument (either void func (int a[]) or void func (int *a)), and you want to call it from Fortran, you can call it simply by giving the array name as an argument: call func (a).

Passing a multi-dimensional array to a C function is potentially problematic. However, you’ll almost never need to do that, because Fortran very rarely needs to invoke C to do a numerical calculation. If the C declaration is func (int a[][3]), for example (that is, an array of three-element arrays), then a Fortran array integer a(3,$n$) could be passed simply, as call func (a). If, on the other hand, the C declaration were func (int **a) (that is, a pointer to pointer to integer, with the actual array elements separately allocated), then the above Fortran array could not be passed as shown (and no, smarty, call func (%loc(a)) wouldn’t work, even though it’s the right type). If you do need to call such a C function, you’ll probably have to provide a suitable C wrapper for it, and call that from the Fortran code. C’s array/pointer syntax is elegant, but not ideally suited for numerical work10.

##### 2.5.4.2 Strings

C strings are simple objects: they are by definition arrays of characters, with the end of the string marked by a zero byte. The internal structure of Fortran strings is not defined, so that they could be stored in whichever way is most convenient to the author of the compiler; typically, however, they are an array of characters with a length encoded with them. The practical upshot of this is that you simply cannot pass strings back and forth between Fortran and C code in a portable way, and it is fortunate that you rarely need to do this. On those occasions when you do need such a facility, you can use a library of conversion routines such as those described in SUN/209.

##### 2.5.4.3 Compiling and linking

When a Fortran compiler produces object code,it typically adds an underscore to the end of each function name. That is, the subroutine:

subroutine func1 (i)
integer i
call func2 (i)
end

will produce object code with an external symbol func1_ calling a subroutine named func2_. You must be aware of this when you compile Fortran code which is to be linked with C functions. There are two ways of dealing with this.

First, you can call this subroutine from C by calling it with the correct name:

int i1;
extern void func1_ (int *i);
/* call it */
func1_ (&i1);

So far so good. The problem arises when you want to call C from the Fortran function, since the Fortran function will expect to link against a function with a trailing underscore. If the C function is written by you, then you could provide this, but if it is a library routine, you will have to tell the compiler not to add the underscore to the external name when it generates object code. For Sun’s f77 compiler, you do this with the compiler option -ext_names=plain, for the GNU g77 compiler it is with the -fno-underscoring option, and for f77 on the Alpha, it is -assume nounderscore [RW]. Note that this will apply to all function names in that module. Sun’s compiler also allows you to declare that a function is in a C module, using a pragma, but this is obviously non-portable, and so is not recommended. On this subject, [RW] points out that to get access to main() via f77 on decs, you need to set -nofor_main at link time.

You should, in general, use the Fortran compiler to link the object files into an executable. This calls the linker with all the correct Fortran libraries. It is of course possible to do the same with the C compiler, but requires a much more elaborate call.

#### 2.5.5 Compilers, and other stray remarks on code

Using Fortran’s implicit none keyword is a Good Idea. The tiny amount of typing time you save by relying on Fortran’s automatic declaration of variables, is more than likely wiped out by the debugging time spent clearing up silly problems implicit none would have avoided from the beginning.

Similarly, in C, use ANSI-C rather than older ‘K&R’ C, and use prototypes religiously. That is, if you have a module myfuncs.c, then create a header file myfuncs.h containing the prototypes of the functions within it, and include it (#include "myfuncs.h") both in any modules which use those functions and inside myfuncs.c as well. That way, if you change the interface to any functions in that module, the compiler will prompt you to change the calls to that function everywhere that it is used. This way, the compiler can help you avoid a whole class of embarrassingly silly problems.

Make your code portable. Unless you are specifically targetting your code at a particular (supercomputer) processor, or magic parallelizing compiler, you will save yourself time in the long run by not making assumptions about the machine or compiler environment. Especially given the imminent arrival of 64-bit machines, you are probably not doing yourself any favours by, for example, assuming that integers are four bytes. Probably more pertinently, exploiting compiler-specific features might prevent you running your code on some new faster architecture. If you do decide to use some non-portable features, you can make the future porting effort easier by having second or third thoughts about precisely how to use them, and by isolating them into a few subroutines, rather than scattering them throughout your code. It follows that....

Compiler Warnings Are Your Friend. Try compiling your codes with compiler warnings switched on (with the option -Wall on GNU compilers, +w2 on Sun Workshop compilers, and -w0 on Compaqs). You might be surprised at the number of peculiarities and non-standard features this exposes, each of which is a potential portability problem for the future. Don’t treat compiler warnings as irrelvant nagging — each of them is, to some extent, an error you have made, which has the possibility of turning round and biting you later. I would particularly advise this if you develop using GNU compilers, as these seem particularly liberal about standards: gcc in particular seems happy to take any old nonsense, say ‘I know what you mean’, and produce code from it — in my experience it usually guesses my intentions correctly, but I don’t want to rely on it. You can also adjust the strictness of the C compiler’s conformance to the ANSI-C standard by using compiler options (Sun: -Xa and -Xc; DEC: -std1; GNU: -ansi and -pedantic). Having said all this, I don’t want to overstate the importance of compiler warnings: the main plank of this advice is to spend time now to save time and aggravation later, but this tradeoff is unlikely to be in your favour if you spend time removing every last anomaly from your code.

### 2.6 Link farms

At the end of each section of this cookbook, I’ll include a collection of web-based resources you can use to search for further information. These will typically be either FAQs (lists of ‘frequently asked questions’) or ‘link farms’ (thematic collections of links with little further detail).

#### 2.6.1 Unix documentation and standards

• POSIX is the term for a suite of applications program interface standards to provide for the portability of source code applications where operating systems services are required. POSIX is based on the UNIX (Registered trademark administrated by the Open Group) Operating System, and is the basis for the Single UNIX Specification from The Open Group. [IEEE PASC (Portable Application Standards Committee)]
• This is currently (2001-06-20) undergoing revision by the Austin Group
• The POSIX standard is
(1)
“IEEE Std 1003.1 Standard for Information technology — Portable Operating Systems Interface (POSIX) — Part 1: System Interface [C language binding]”, which is identical to “ISO/IEC JTC1 IS 9945-1 Standard for Information technology — Portable Operating Systems Interface (POSIX) — Part 1: System Interface [C language binding]”
(2)
“IEEE Std 1003.2 Standard for Information technology — Portable Operating Systems Interface (POSIX) — Part 2: Shell & Utilities”, which is identical to “ISO/IEC JTC1 IS 9945-2 Standard for Information technology — Portable Operating Systems Interface (POSIX) — Part 2: Shell & Utilities”.

...and these cost money (lots) to buy on paper.

• Single Unix specification, from the Open Group (other publications). My understanding from the PASC document above is that Single Unix is broader than POSIX, and covers more of the environment (bit vague, here). It feeds into the POSIX/ISO standardisation process, but is not a formal standard itself. Single Unix does, however, have the great advantage of being available online.
• The link with The Unix System is, I fear, rather obscure to me, but it appears that the latter is the public face of the Open Group’s labours.

http://www.geek-girl.com/unix.html: The ‘Unix Reference Desk’. This is a very useful collection of pointers to unix documentation.

http://www.faqs.org/faqs/unix-faq/faq/: the General Unix FAQ, full of arcana. This is one of several FAQs which cover Unix and related issues; for some others, look at http://www.faqs.org/faqs/by-newsgroup/comp/comp.unix.questions.html

http://uk.yahoo.com/Computers_and_Internet/Software/Text_Editors/vi/: Collection of links to vi documentation at Yahoo.

1This is a slightly different convention from that used in, for example, ?, which defines $ϵ$ to be the smallest number for which $1\oplus ϵ\ne 1$.

2For example, appendix E of ? displays ‘Kahan’s summation formula’, which allows you to perform a large sum without losing precision in the manner described above.

3There used to be an issue here in that all of K&R C’s floating-point operations were defined to be done in double-precision — this made things easy for compiler writers, at the expense of runtime. This is no longer true in ANSI C.

4Not impossible. Since someone has already done the hard work of implementing a BASIC interpreter in TeX (honestly!), you’d simply have to port your code to BASIC and let it rip.

5Note that, with pipelining, RISC chips can typically support some degree of on-chip parallelization, even for a single CPU.

6A statement like that can’t be made without some qualification. Depending how you added it up, you could probably make a case that old Fortran dialects probably have more code actually running on CPUs, since many heavily-used libraries were written a long time ago.

7There are two versions of ps on Suns — this example assumes you are using the /usr/ucb/ps version.

8However, there is no excuse for Bubble Sort!

9If you don’t get a core file, it might be that you have your shell set to prevent it — possibly as a (reasonable) precaution to avoid filling up filespace with ‘useless’ core files. The command ulimit -c (sh-type shells only) will show the maximum size of core file: setting this to zero inhibits creating core files, and setting it to a very large number or to unlimited allows core files to be created. On csh-type shells, the corresponding command is limit coredumpsize unlimited

10A little known factoid for C enthusiasts: did you know that C’s array reference syntax is commutative, since a[i] is defined to be equivalent to *(a+i) and is thus equal to i[a]? This means that a[3] is the fourth element of the array a, and so is 3[a]! Bizarrely enough, the latter is a legitimate array reference, but one you’re probably best not including in your own code.