This is based on a neat little post that I saw on Simon Frankau‘s blog that I thought I’d provide a few more details on, as well as bringing it to a wider audience.

Some higher-level data structures in object-oriented programming languages have dynamic memory allocation. This is used to create objects (usually things that behave like lists) that can grow and shrink as the program executes.

When initializing a regular array in C or Java, you have to specify the size of the array on creation, for example with

int myArray[] = new int[10];

in Java (actually Java has a wide variety of syntaxes for array creation, which I won’t go into). Importantly, your array has the size you initially specify for the rest of time. When it’s created, you are allocated a sequential block of memory that’s exactly big enough to hold the elements of the array (it’s sequential so that looping through the elements of the array is fast). If you find that you need more storage space at any point in time, you’re out of luck.

Fortunately most high-level languages like provide a data structure which is able to grow, for example C++’s std::vector or Java’s ArrayList and Vector. When these are created they are allocated a block of memory. If it ever turns out that you need more space, then they allocate a new block of memory to allow you to extend the array: for example, if you have an array of size 10 and then try to allocate something to the 11th position:

ArrayList myArray = new ArrayList(10);
myArray.set(10, "foo");

But how much new memory should they allocate?

A little bit of investigation reveals that this varies across languages and across data structures. Commonly, a multiplicative growth factor is applied. For example, with a growth factor of two, you would double the size of the allocated block each time. To ensure that the array elements remain sequential in memory (so that it’s still fast) you probably want to allocate a new block and copy across the elements in the old block – so your strategy works like this:

  • Allocate a new block of memory that is bigger than the current block by some factor
  • Copy the old data into the new memory block
  • Start writing the new elements here

As Simon points out, a natural strategy is to double the size of the array each time, but this is probably a bad choice. This is because at each stage you assign a new memory block of size 2^n, but the total size of the memory locations used so far (assuming your initial array size is rescaled to be 1) is 1 + 2 + \cdots + 2^{n-1} = 2^n - 1, so you can’t reuse any of the old memory. Since you’re now only dealing with the newly allocated memory, all this old memory goes to waste.

What would be a better factor? Let’s say we used x. Then initially we’d have a block of (rescaled) length 1. We allocate a new block of length x. When we extend again we allocate a block of length x^2. We need to keep the old block of length x around to copy over the old data, and we can’t fit our block of size x^2 in the space of size 1, so we allocate this as a new block too. But when we allocate the next block of size x^3, we might be able to fit it in the two old blocks, of size 1 and x. This works if

1 + x = x^3

which you can solve numerically and find that there is a solution x\approx 1.32. What if you were willing to wait for one more allocation before reusing the memory? Then we’d try and fit a block of size x^4 into a block of size 1 + x + x^2, so we solve

1 + x + x^2 = x^4

for a solution of x \approx 1.47. How about if we were willing to wait for n allocations? Then we’d be trying to solve

1 + x + \cdots + x^{n-2} = x^n

which is more tricky. However, if we notice that the left-hand side is a geometric series, then we can sum it up and multiply through to get

x^{n-1} - 1 = x^{n+1} - x^n

This still looks a but hopeless, but when $n$ is large the three terms involving x will dominate the ‘-1’ and we can neglect it. This corresponds to an allocation strategy where we know that we will be able to reuse a block of memory at some point in the future. We get:

x^{n-1} = x^{n+1} - x^n

This is looking more hopeful. Now we can divide by x^{n-1} and rearrange, giving a familiar-looking equation:

x^2 - x - 1 = 0

Why does this look so familiar? It’s positive solution is exactly x = \tfrac{1}{2}(1 + \sqrt{5}) \approx 1.618\dots, the well-known golden ratio! Now, you wouldn’t want to use exactly this value to do your memory reallocation, because you’d end up waiting forever to be able to reuse your old memory. But Simon suggests that a strategy of using a number a bit less than this, say 1.5, might work out.

And guess what? Java’s ArrayList uses exactly this factor to allocate new memory. I don’t know if it reuses its previously allocated memory in the style described here – that’s something to find out for the future.

A few more exercises – one that looks at the way Scheme interprets expressions and a brief glance at special forms, and a couple of numerical ones.

Exercise 1.6

Alyssa P. Hacker doesn’t see why if needs to be provided as a special form. Can’t we just define it in terms of cond? Alyssa defines a new version of if:

(define (new-if predicate then-clause else-clause)
  (cond (predicate then-clause)
        (else else-clause)))

This seems to work:

> (new-if (= 0 1) 0 5)

> (new-if (= 1 1) 0 5)

Alyssa now uses new-if to rewrite the square root program:

(define (sqrt-iter guess x)
  (new-if (good-enough? guess x)
          (sqrt-iter (improve guess x)

What happens when Alyssa tries to use this program to compute square roots? Explain.

The program will enter into an infinite loop. When you call new-if the interpreter attempts to evaluate all the inputs, and then passes them into cond. In the square root program, the third input to new-if calls sqrt-iter, which involves another call to new-if, which calls sqrt-iter again, and so on.

In contrast, the special form if evaluates its first argument, and then evaluates only one of the remaining two arguments depending on whether the first argument is true or false — therefore the problem of entering into an infinite loop never arises, as eventually the first argument returns true, which means that the second argument (rather than the third) is the one that gets evaluated.

Exercise 1.7

The good-enough? test used to compute square roots will not be very effective for finding the square roots of very small numbers. Also, in real computers, arithmetic operations are almost always performed with limited precision. This makes our test inaccurate for very large numbers. Explain these statements, with examples showing how the test fails for small and large numbers. An alternative strategy for implementing good-enough? is to watch how guess changes from one iteration to the next and stop when the change is a very small fraction of the guess. Design a square root procedure that uses this kind of test. Does it work better for small and large numbers?

The procedure fails for very small numbers because it is not accurate enough. For example, say we want to compute the square root of 0.000001 (which is 0.001) using this procedure. If our initial guess was 0.02 (too big by a factor of 20) then our test squares it to get 0.0004, which is certainly within 0.001 of the input — so this guess passes the test, and is reported as the true number. Here’s an example, using 1.0 as an initial guess:

> (sqrt 0.000001)

It fails for large numbers because the computer only stores floating point numbers to a limited precision — it uses a fixed number of bits to store a number of significant digits, and the rest of the bits to store an exponent. The limit of precision is often called machine epsilon. For significantly large numbers, machine epsilon is bigger than 0.001, and thus the difference between any two different, sufficiently large numbers will be greater than 0.001, and the process will not terminate. For example, trying to computer

> (sqrt 1000000000000000)

(that’s 1 followed by 15 zeros) causes the interpreter to hang.

To get around this, we can redefine good-enough?. We supply it with the previous guess and the new guess, and it returns true when the difference in guesses is smaller than the new guess by some fixed ratio (let’s say 0.000001, or 10^{-6}). Our sqrt-iter process becomes:

(define (sqrt-iter old-guess new-guess)
  (if (good-enough? old-guess new-guess)
      (sqrt-iter new-guess (improve new-guess))))

with good-enough? now given by:

(define (good-enough? old-guess new-guess)
  (< (abs (- old-guess new-guess)) (* 0.000001 new-guess)))

We can now try computing the square roots of very small and large numbers:

> (sqrt 0.000001)

> (sqrt 1000000000000000)

and see that everything works as we would expect it to.

Exercise 1.8

Netwon’s method for cube roots is based on the fact that if y is an approximation to the cube root of x, then a better approximation is given by the value

\dfrac{x/y^2 + 2y}{3}.

Use this formula to implement a cube root procedure analogous to the square root procedure.

We only need to change a few of our procedures. Our newly improved good-enough? procedure is already fine. If we implement a procedure improve-cbrt as

(define (improve-cbrt guess x)
  (/ (+ (/ x (square guess))
        (* 2 guess))

and the iterative cube root as essentially identical to the iterative cube root:

(define (cbrt-iter old-guess new-guess x)
  (if (good-enough? old-guess new-guess)
      (cbrt-iter new-guess (improve-cbrt new-guess) x)))

and finally

(define (cbrt x)
  (cbrt-iter 0.0 1.0  x))

then we can use our new cube root procedure as we’d expect:

> (cbrt 8)

> (cbrt 100)

Next we’ll look at a way to hide our abstractions from the end user, by defining procedures inside other procedures.

Procedures are like mathematical functions, with one important difference: a mathematical function can provide declarative definitions, whereas a procedure must provide imperative definitions. A declarative statement is a ‘this is true’ statement. An imperative statement is a ‘do this’ statement. We’ll examine the difference by looking at how the square root function is defined in mathematics and in computer science.

1.1.7 Example: Square roots by Newton’s method

When we write sqrt(x) in mathematics, what do we mean? One possible definition is

sqrt(x) is the unique non-negative real number whose square is equal to x

or, expressed in mathematical notation,

\sqrt{x} is the unique y\in\mathbb{R} such that y\geq 0 and y^2=x.

This is a perfectly valid definition, and you can prove that it is well-formed: that is, you can prove that for non-negative numbers x, there really is a unique non-negative number y such that the square of y is x, and therefore that the definition of the ‘sqrt’ function makes sense.

However, just knowing that we have a sensibly defined function is not much help in computer science. We need to know how to computer the function. That is, we need a sequence of statements that that lets us deduce the value of the output of the function, given an input (actually we will only compute an approximation to the value of the function, but that’s okay – we can make our approximations really very accurate).

This is what we mean by declarative vs imperative knowledge. According to SICP, mathematics is concerned primarily with declarative (what is) statements, whereas computer science is primarily concered with imperative (how to) statements. I’d take issue with this, given the existence of a whole subfield of mathematics, numerical analysis, which concerns itself with how best to find solutions to equations. But I’m not here to debate semantics – I’m here to learn about computer science.

So how do we compute square roots?

The most common way is to use Newton’s method of successive approximations (tip: if you’re interviewing for a job that involves mathematical and programming literacy, learn how to compute square roots using Newton’s method. It’s a favourite interview question). Newton’s method is a formula for finding roots of general functions $f$, but here we only need it in a very simple form. To find the solution of y = sqrt(x), we make an initial guess (say y = 1) and check if it’s accurate enough. If not, then we refine our guess by averaging y and x/y, and check again. We continue iterating in this way until we eventually converge on the solution. You can prove that for any positive initial guess, this method converges to the right solution. Moreover, you can prove that you double the number of digits of accuracy with each iteration — this is known as quadratic convergence.

Let’s formalize this in terms of procedures. This is just a matter of translating our description of the formula above into Lisp code:

(define (sqrt-iter guess x)
  (if (good-enough? guess x)
      (sqrt-iter (improve guess x)

This procedure simply says that, given a guess for the solution, we first check if the guess is good enough; if it is, then we report the guess as our solution, if not, then we improve the guess and apply the procedure again. Note that we haven’t yet defined the procedures good-enough? or improve yet. We’re using a programming strategy called wishful thinking — we write the program as if the sub-procedures we need exist, and then we write them later.

We already know how to improve our guess: we average the old guess y with x/y, like this:

(define (improve guess x)
  (average guess (/ x guess)))

where the procedure average is defined to be

(define (average x y)
  (/ (+ x y) 2))

We also have to define what we mean by ‘good enough’. We’ll say that our guess is good enough if the square of the guess is within a small distance (say 0.001) of the input. This isn’t a very good test, for several reasons, but it will do for now:

(define (good-enough? guess x)
  (< (abs (- (square guess) x)) 0.001))

Finally, we need a starting guess. We can always guess that the square root of a number is 1:

(define (sqrt x)
  (sqrt-iter 1.0 x))

Note that because Scheme has a rational data type, we need to put 1.0 as our initial guess, rather than 1. If x was an integer and we guessed 1, then the interpreter would use rational arithmetic, and all subsequence operations would result in rational numbers. By guessing 1.0 instead, we force the result of subsequent operations to be floating point numbers.

We can now use our new procedure as we’d use any other:

> (sqrt 9)

> (sqrt 100)

> (square (sqrt 1000))

This demonstrates that the simple procedural language we have introduced so far is suffient for writing numerical programs. Notice that we haven’t introduced any iterative (looping) constructes yet. Instead, we relied on the simple ability of a procedure to call itself.

Now we get onto some more interesting exercises, that explore the way that the Lisp interpreter works.

Exercise 1.4

Our model of evaluation allows for combinations whose operators are compound expressions. Use this observation to describe the behaviour of the following procedure:

(define (a-plus-abs-b a b)
  ((if (> b 0) + -) a b))

This is cool! We want the function to return a + |b|, which we can do by making it return a + b if b>0 and a - b otherwise. Because procedures are like any other data type in the language, they can be the return value of an expression. We use that to our advantage here, by using an if procedure that returns the procedure + in the case that b > 0 and - otherwise. Try doing that in Java!

Exercise 1.5

Ben Bitdiddle has invented a test to determine whether the interpreter he is faced with is using applicative-order or normal-order evaluation. He defines the following two procedures:

> (define (p) (p))

> (define (text x y)
    (if (= x 0)

Then he evaluates the expression

> (test 0 (p))

What behaviour will Ben observe with an interpreter that uses applicative-order evaluation> What about normal-order?

The procedure p is defined in terms of itself, so if we ever try to evaluate p we will end up in an infinite loop. If we use applicative order application, then both arguments to test are evaluated before we call test with those arguments. Since this involves evaluating p, the interpreter will hang.

On the other hand, if we use normal-order application then we expand out the arguments without evaluating them, to get

(if (= 0 0)

now when we evaluate this, we first test if (= 0 0), which returns true, so we execute the first statement after the predicate: the second statement, which is p, is never evaluated, and the program happily returns 0.

The first exercise is merely to evaluate some Lisp expressions manually, and then check them in the interpreter. Easy and not too interesting, but I did say that I’d do every exercise, so I guess I’ll plough ahead.

This is good time to talk about the interpreter I’m using. I initially toyed with downloading the MIT Scheme interpreter, which is a minimalist product that’s designed to be used alongside this course. I also thought about downloading Emacs – you can get implementations with a built-in Lisp interpreter. But I’ve been getting along quite happily with a combination of Vim and TextMate (for Mac OS X) for the past few months, and I’m not sure that learning a new text editor is necessary right now. I also wanted to do some more advanced coding in Lisp (what’s the point of learning a new language if I’m not going to use it?) so I wanted something a bit meatier than MIT Scheme.



I eventually settled on Racket, a bells-and-whistles implementation of Scheme that includes a lot of extra features out of the box – lots of nice data structures, the ability to create GUIs and manipulate graphics, and lots more. It comes with its own interpreter, DrRacket, which is functional and pretty easy to use. A nice touch is that you can load any definitions file into the interpreter to give you a whole load of pre-defined functions – and if the first line of the definitions file is #lang <language-name> then you will only use the features of that language. For example, the first line of the definitions file I’m using for this project is #lang scheme so I don’t have access to all the funky procedures defined in Racket – just the core Scheme language.

On with the exercises.

Exercise 1.1

What is the result printed by the interpreter in response to each of the following expressions?

> 10 

Numerals have a value equal to the number they represent.

> (+ 5 3 4)

This is equivalent to 5 + 3 + 4. Note that the + procedure can take multiple arguments. I haven’t learned how to define a function with a variable number of arguments yet, but hopefully I will soon!

> (- 9 1)

This is equivalent to 9 – 1.

> (/ 6 2)

Equivalent to 6 / 2. Note that we get an integer result.

> (+ (* 2 4) (- 4 6))

Equivalent to (2 * 4) + (4 – 6) = 8 + (-2). Note that there is no need for operator precedence in Scheme, because we always include the parentheses! I wonder if a syntax for Scheme could be defined that allowed you to leave out parentheses when the meaning was clear, and had rules of operator precedence instead. For example, the K programming language that I use at work has simple right-to-left order of evaluation. This can lead to very code and lets you easily write one-line expressions that are very powerful, but has many gotchas: for example, 2*4+5 evaluates to 18 in K, rather than 13.

> (define a 3)

Whether or not define statements return a value is implementation-dependent. My interpreter doesn’t give a return value. However, we have now bound the value 3 to the variable a.

> (define b (+ a 1))

Again, there is no return value, but we have bound the value of (+ a 1) to the variable b, so that b now has the value 4.

> (+ a b (* a b))

This is equivalent to a + b + (a * b) where a = 3 and b = 4.

> (if (and (> b a) (< b (* a b)))

This procedure says “if (b > a) and (b < a * b) then return b, else return a”, which becomes “if (4 > 3) and (4 < 12) then return 4, else return 3” so it returns 4.

> (+ 2 (if (> b a) b a))

First evaluate the if statement. It says “if (b > a) return b, else return a” which becomes (by substitution) “if 4 > 3 then return 4, else return 3”, so it returns 4. We then evaluate (+ 2 4), which returns 6.

> (* (cond ((> a b) a)
           ((< a b) b)
           (else -1))
     (+ a 1))

We first evaluate the cond statement. It evaluate to 4, since the first predicate is false but the second one is true. The second argument (+ a 1) evaluates to 4 also, and finally (* 4 4) evaluates to 16.

Exercise 1.2

Translate the following expression into prefix form:

\dfrac{5 + 4 + (2 - (3 - (6 + 4/5)))}{3(6 - 2)(2 - 7)}

I reckon we have:

> (/ (+ 5
        (- 2
           (- 3
              (+ 6
                 (/ 4 5)))))
     (* 3
        (- 6 2)
        (- 2 7)))

That return value was a surprise to me! I had assumed that when applying the division operator to two integers, Scheme would either perform integer division (like Java) or return a floating point number (like Python). Apparently it has a built-in rational data type. Which is nice.

Exercise 1.3

Define a procedure that takes three numbers as arguments and returns the sum of the squares of the two larger numbers.

Here’s a pretty ugly answer, using the built-in min function:

> (define (f a b c)
    (- (+ (* a a) (* b b) (* c c))
       (* (min a b c) (min a b c))))

Why is it ugly? Well, it needlessly applies min twice, and it does four multiplications, two additions and a subtraction, when all that’s needed is one addition and two multiplications. How about:

> (define (g a b c)
    (cond ((and (<= a b)
                (<= a c)) (+ (* b b) (* c c)))
          ((<= b c) (+ (* a a) (* c c)))
          (else (+ (* a a) (* b b)))))

This looks uglier, but it’s more efficient: it never performs more than two multiplications and one addition, and never more than three comparisons (the minimum possible, since you have to sort the arguments so that you know which is the smallest, and this takes three comparisons in the worst case. If anyone has a prettier and efficient solution (perhaps one that works for arbitrary lists of arguments?) then I’d like to see it.

We’re going to introduce a simple model for how the interpreter evaluates expression that you type in, and then go on to look at an important feature of Lisp (well, an important feature of any programming language…): conditionals.

1.1.5 The substitution model for procedure application

A model for how the interpreter applies primitive procedures to arguments is as follows:

To apply a compound procedure to arguments, evaluate the body of the procedure with each formal parameter replaced by the corresponding argument.

Let’s illustrate this by evaluating

(sum-of-squares 3 4)

which expands to

(+ (square 3) (square 4))

Now, (square 3) evaluates to 9 and (square 4) evaluates to 16, so we have

(+ 9 16)

and finally we get the result


This is called the substitution model for procedure application. This is not how the interpreter really works, but it is a simple model that captures the spirit.

Applicative order and normal order

We said that the interpreter first evaluates the operator and operands, and then applies the procedure to the resulting arguments. This is not the only way to evaluate compound procedures. An alternative model does not evaluate any expressions until their results are needed. Instead, it substitutes expressions for parameters until it obtains an expression involving only primitives, and then performs the whole evaluation.

For example, (sum-of-squares 3 4) would be evaluated first to

(+ (square 3) (square 4))

and then to

(+ (* 3 3) (* 4 4))

and now that we only have primitives, we can begin evaluating the whole expression:

(+ 9 16)

and finally


This is called normal-order evaluation, as opposed to applicative-order evaluation which is what the compiler actually uses. Normal-order evaluation is less efficient (since you can end up evaluating the same expression multiple times) and becomes problematic when we leave the realm of procedures that can be modeled by substitution.

1.1.6 Conditional expressions and predicates

There is a special construct in Lisp for doing case analysis, called cond which stands for “conditional”. It is used as follows:

> (define (abs x)
    (cond ((> x 0) x)
          ((= x 0) 0)
          ((< x 0) (- x))))

The general form of a conditional expression is

(cond (<p1> <e1>)
      (<p2> <e2>)
      (<pn> <en>))

i.e. it consists of the symbol cond followed by pairs of expressions in parentheses:

(<p> <e>)

called clauses. The first expression in each pair must be a predicate, i.e. an expression that evaluates to true or false.

Conditional expressions are evaluated as follows. The predicate <p1> is evaluated first. If its value is false then <p2> is evaluated, and so on. This continues until a predicate is found whose value is true, in which case the interpreter returns the value of the appropriate consequent <e> as the value of the cond expression. If none of the predicates evaluate to true, then the value of the cond is undefined.

There is a special symbol else which can be used in place of the final predicate. This causes the cond to return the value of the final expression <e>. We could also write the absolute value procedure as:

> (define (abs x)
    (cond ((< x 0) (- x))
          (else x)))

There is also the special form if, a restricted form of conditional that can be used when there are exactly two mutually exclusive cases. This gives another way to write the absolute value procedure:

> (define (abs x)
    (if (< x 0)
        (- x)

The general form of an if expression is

(if <predicate> <consequent> <alternative>)

To evaluate the expression, the interpreter first evaluates the <predicate> part. If it is true, then it returns the value of <consequent> as the value of the if, otherwise it returns the value of <alternative>. It is important to realise that only one of the <consequent> and the <alternative> are ever evaluated.

In addition to primitive predicates like and = there are also logical composition operators, enabling us to construct compound predicates. The three most common are

(and <e1> ... <en>)

which returns false if any of the <e> evaluate to false, and otherwise returns true,

(or <e1> ... <en>)

which returns true if any of the <e> evaluate to true, and otherwise returns false, and

(not <e>)

which returns true if <e> evaluates to false, and false if evaluates to true.

As an example of use, we could define a predicate to test whether one number is greater than or equal to another by

> (define (>= x y)
    (not (< x y)))

and we can use it:

> (>= 10 2)
> (>= 2 10)

Next I’ll look at some of the exercises.

Pressing on with SICP, these sections look at how we can combine simple expressions to make more complicated ones.

1.1.3 Evaluating combinations

To evaluate a combination, the interpreter performs the following steps:

  1. Evaluate the subexpressions
  2. Apply the procedure that is the value of the leftmost subexpression to the arguments that are the values of the other subexpressions

The evaluation procedure is recursive in nature – one of its steps is to invoke itself. The base case for this recursion comes when the interpreter encounters a primitive element. We handle the base cases by defining:

  1. The values of numerals are the numbers that they represent
  2. The values of built-in operators are the sequence of machine instructions that carry out the corresponding operations
  3. The values of other names are the objects associated with them in the environment.

It is key to notice the role of the environment: simply writing (+ x y) makes no sense without specifying the environment that provides a meaning for the symbols x and y (or even +).

This evaluation rule doesn’t handle definitions. For example, the line of code (define x 3) does not apply the procedure define to the values of the arguments x and 3 – rather it assigns the value 3 to the variable x.

Such exceptions are called special forms. We will meet more of them shortly.

1.1.4 Compound procedures

We’ve identified some of the elements that must appear in any powerful programming language:

  1. Numbers and arithmetic operations: primitive expressions
  2. Nesting S-expressions: a means of combination
  3. Definitions that associate names with values: a means of abstraction

Procedure definitions provide a more powerful abstraction, whereby a compound operation can be given a name and then referred to as if it were a primitive. For example, we could define a procedure that squares a number:

> (define (square x) (* x x))

This is a compound procedure that has been given the name square, which evaluates to the procedure ‘multiply the argument x by itself’.

The general form of a procedure definition is

(define (<name> <formal parameters>) <body>)

I’m following the convention of SICP that text inside angle brackets is a placeholder for some real code. The is a symbol to be associated with the new procedure in the current environment. The give names to the arguments of the procedure, and the is the description of the procedure.

Having defined square, we can use it:

> (square 14)

We can also use it as a building block in other procedure definitions, for example:

> (define (sum-of-squares x y)
    (+ (square x) (square y)))
> (sum-of-squares 3 4)

Compound procedures are used in the same way as primitive procedures – you can’t tell by looking at the definition of sum-of-squares whether square is built in, or defined as a compound procedure.

Pretty basic stuff in these sections, but it explains some very important concepts and syntax, namely the idea of abstraction via the define statement.

1.1.1 Expressions

A programming language allows you to combine simple expressions to form more complex ideas. To do this it has three important mechanisms:

  1. Primitive expressions which represent the simplest entities
  2. Means of combination to build compound elements from simpler ones
  3. Means of abstraction to name compound elements and treat them as primitives

One kind of primitive expression in Scheme is a number. When you type an expression into the Scheme interpreter, the interpreter responds by evaluating the expression and printing its value, for example

> 486

I’ll use the convention that an expression prefixed by > is something that you type at the interpreter prompt, and the next line is the interpreter’s response. We form compound expressions using prefix notation, for example

> (+ 5 2)
> (- 10 7)
> (* 6 9)

These expressions are called combinations or S-expressions. The leftmost element is the operator, and the other elements are the operands. The value of the expression is obtained by applying the procedure specified by the operator to the arguments, which are the values of the operands.

Scheme used prefix notation, which looks unfamiliar at first but is just as valid as the standard infix notation, written 5 +2 or 10 - 7. In fact, it can have advantages, for example when procedures take an arbitrary number of arguments:

> (+ 10 7 3 5)

We can also easily nest combinations:

> (+ (* 4 2) (- 10 7))

1.1.2 Naming and the environment

None of this would be useful without the ability to give names to complex expressions, and refer to them by name later. For example, we can define the name radius to refer to a specific value by using the keyword define

> (define radius 10.0)

We could also define the value of a useful mathematical constant:

> (define pi 3.14159)

We can combine our names to make more complex expressions:

> (define area (* pi radius radius))
> area

To implement this ability to name expressions and refer to them later, the interpreter must have some way of associating names to values. This mapping is called the environment, or more accurately an environment, since we will see later that a computation may involve many different environments.

Out of a desire to (i) learn a functionalStructure and Interpretation of Computer Programs programming language, and (ii) generally become a better programmer, I’ve decided to work my way through Abelson and Sussman’s Structure and Interpretation of Computer Programs. It’s a bit of a beast, but I’ve read a chunk of it already and its style appears to me. I’ve also been intending to learn a dialect of Lisp for a while, and this seems like a good opportunity.

This is the book which forms the core of the borderline-legendary MIT course 6.001, which provides as good an introduction to the foundations of computer programming as is available anywhere in the world. It will be difficult, but rigorous.

The idea is that I’ll read every word, and do all of the programming exercises. As I finish eash section, I’ll post what I’ve learnt here, together with the source code I used to solve the exercises, and anything else I think will be useful.

I fully expect it to take me a long time (fitting it in around a day job will be tricky) but I’m feeling confident. If anyone has read it (or even better, is reading it at the moment) I’d love to hear from you.

I’ve been thinking a little bit about covariance matrices recently, particularly how to estimate them from sample data. A typical problem in finance is to estimate a covariance matrix for a universe of stocks from a finite sample of returns data. The problem is that the number of parameters quickly becomes large, and can completely dwarf the number of samples you have available.

For example, consider trying to estimate the correlation of the stocks in the S&P 500 from a year of daily returns data — overall that’s about 500 \times 250 = 125,000 pieces of information, to estimate a total of \tfrac{1}{2} \times 500 \times 501 = 125,250 parameters. Oops. As if that wasn’t bad enough, you are building a matrix with 500 rows out of only 250 columns — this means that your matrix is going to be of rank 250 at best, and possibly lower. In particular, it won’t be invertible. If your application requires an invertible covariance matrix — say, if you’re building a Markowitz portfolio — then you’re already out of luck.

Fortunately there are several techniques that can be employed to improve the estimation of the covariance matrix, and ensure that it has full rank and is invertible. The simplest of these is called shrinkage, which is a fancy name for something simple: we transform to the correlation matrix, scale that toward the identity matrix, and then transform back into correlation space:

\Sigma_{\rm new} = \lambda \Sigma_{\rm sample} + (1-\lambda) I

for some 0\leq\lambda\leq 1. This ensures that the correlation matrix, and hence the covariance matrix, is invertible. Over the next few posts I’m going to look into some more advanced techniques, including techniques that allow practitioners to incorporate their prior beliefs about market structure into the model.