I was linked to a collection of Edsger Dijkstra’s writings today, and spent an interested half hour reading some of them. Djikstra was known for composing his manuscripts in longhand, using a fountain pen. He would then make a dozen or so photocopies, and mail them out to people he knew to be interested in his work. They would then be further disseminated by those people, and so on, until his work had reached a wide audience. Each article was coded with his initials *EWD*, followed by a three or four digit number — they thus became known as *EWD*s in the computer science world.

The most interesting article I read today was *EWD831*, whose full title is *Why numbering should start as zero*. Djikstra argues persuasively that counting should start from zero, and thus that all arrays should be indexed beginning from zero. This runs counter to the natural intuition (and to what we are taught at school) that counting begins at 1.

In brief, Dijkstra’s argument runs like this. There are four possibilities for notating the range of numbers 1, …, 10 without using ellipsis notation, which can be ambiguous:

- 1 ≤
*i*< 11 - 0 <
*i*≤ 10 - 0 <
*i*< 11 - 1 ≤
*i*≤ 10

There are several desirable properties that we might want our system of notation to have. For example:

- (a) the difference between the bounds is equal to the number of terms in the sequence
- (b) two ranges are adjacent if the upper bound of one is equal to the lower bound of the other.

Both 1 and 2 satisfy these properties, but 3 and 4 don’t. If we want to use our notation to specify a range of natural numbers, then it seems sensible to ask for the following two properties:

- (c) the lower bound is a natural number
- (d) the upper bound is a natural number, even when defining the empty sequence

Property (c) is only satisfied by 1 and 4 and property (d) is only satisfied by 1 and 3 So we see that it is only notation 1 that satisfied all of the properties we desire.

Now we move on to the question of whether we should start indexing at 0 or at 1. This is simply an issue of aesthetics. If we want to describe the range of an array with *N* entries, then by starting at 0 we would write the range as 0 ≤ *i* < *N*, whereas if we start indexing at 1 then it is written as 1 ≤ *i* < *N*+1, which is significantly more ugly. So we see that we should start indexing at 0. Note that programming languages such as C, Java and Python all follow this convention, as do most widely adopted languages. The only commonly used languages I can think of that don’t use it are Fortran and R.

## 3 comments

Comments feed for this article

June 24, 2011 at 12:38 pm

Bill the LizardLanguages that use 1 as a base index are certainly in the minority. COBOL is the only other (formerly) commonly used language that I can think of that uses it. There’s an array system cross-reference list maintained on Wikipedia where I found a few others.

June 24, 2011 at 1:33 pm

Chris TaylorInteresting that ALGOL 68 based arrays at 1, given that Dijkstra himself was one of the designers of ALGOL 60.

June 25, 2011 at 9:12 am

commasterIf you know a little about assembly language and the underlying hardware, you will realize that zero-based index are more desirable.