You are currently browsing the tag archive for the ‘r’ tag.

I was linked to a collection of Edsger Dijkstra’s writings today, and spent an interested half hour reading some of them. Djikstra was known for composing his manuscripts in longhand, using a fountain pen. He would then make a dozen or so photocopies, and mail them out to people he knew to be interested in his work. They would then be further disseminated by those people, and so on, until his work had reached a wide audience. Each article was coded with his initials EWD, followed by a three or four digit number — they thus became known as EWDs in the computer science world.

The most interesting article I read today was EWD831, whose full title is Why numbering should start as zero. Djikstra argues persuasively that counting should start from zero, and thus that all arrays should be indexed beginning from zero. This runs counter to the natural intuition (and to what we are taught at school) that counting begins at 1.

In brief, Dijkstra’s argument runs like this. There are four possibilities for notating the range of numbers 1, …, 10 without using ellipsis notation, which can be ambiguous:

  1. 1 ≤ i < 11
  2. 0 < i ≤ 10
  3. 0 < i < 11
  4. 1 ≤ i ≤ 10

There are several desirable properties that we might want our system of notation to have. For example:

  • (a) the difference between the bounds is equal to the number of terms in the sequence
  • (b) two ranges are adjacent if the upper bound of one is equal to the lower bound of the other.

Both 1 and 2 satisfy these properties, but 3 and 4 don’t. If we want to use our notation to specify a range of natural numbers, then it seems sensible to ask for the following two properties:

  • (c) the lower bound is a natural number
  • (d) the upper bound is a natural number, even when defining the empty sequence

Property (c) is only satisfied by 1 and 4 and property (d) is only satisfied by 1 and 3 So we see that it is only notation 1 that satisfied all of the properties we desire.

Now we move on to the question of whether we should start indexing at 0 or at 1. This is simply an issue of aesthetics. If we want to describe the range of an array with N entries, then by starting at 0 we would write the range as 0 ≤ i < N, whereas if we start indexing at 1 then it is written as 1 ≤ i < N+1, which is significantly more ugly. So we see that we should start indexing at 0. Note that programming languages such as C, Java and Python all follow this convention, as do most widely adopted languages. The only commonly used languages I can think of that don’t use it are Fortran and R.


About me

Proto-hacker, ex-mathematician and aspiring flaneur. Now living in London and making my living from algorithmic trading.