You are currently browsing the monthly archive for June 2011.

I’ve been thinking a little bit about covariance matrices recently, particularly how to estimate them from sample data. A typical problem in finance is to estimate a covariance matrix for a universe of stocks from a finite sample of returns data. The problem is that the number of parameters quickly becomes large, and can completely dwarf the number of samples you have available.

For example, consider trying to estimate the correlation of the stocks in the S&P 500 from a year of daily returns data — overall that’s about 500 \times 250 = 125,000 pieces of information, to estimate a total of \tfrac{1}{2} \times 500 \times 501 = 125,250 parameters. Oops. As if that wasn’t bad enough, you are building a matrix with 500 rows out of only 250 columns — this means that your matrix is going to be of rank 250 at best, and possibly lower. In particular, it won’t be invertible. If your application requires an invertible covariance matrix — say, if you’re building a Markowitz portfolio — then you’re already out of luck.

Fortunately there are several techniques that can be employed to improve the estimation of the covariance matrix, and ensure that it has full rank and is invertible. The simplest of these is called shrinkage, which is a fancy name for something simple: we transform to the correlation matrix, scale that toward the identity matrix, and then transform back into correlation space:

\Sigma_{\rm new} = \lambda \Sigma_{\rm sample} + (1-\lambda) I

for some 0\leq\lambda\leq 1. This ensures that the correlation matrix, and hence the covariance matrix, is invertible. Over the next few posts I’m going to look into some more advanced techniques, including techniques that allow practitioners to incorporate their prior beliefs about market structure into the model.


I was linked to a collection of Edsger Dijkstra’s writings today, and spent an interested half hour reading some of them. Djikstra was known for composing his manuscripts in longhand, using a fountain pen. He would then make a dozen or so photocopies, and mail them out to people he knew to be interested in his work. They would then be further disseminated by those people, and so on, until his work had reached a wide audience. Each article was coded with his initials EWD, followed by a three or four digit number — they thus became known as EWDs in the computer science world.

The most interesting article I read today was EWD831, whose full title is Why numbering should start as zero. Djikstra argues persuasively that counting should start from zero, and thus that all arrays should be indexed beginning from zero. This runs counter to the natural intuition (and to what we are taught at school) that counting begins at 1.

In brief, Dijkstra’s argument runs like this. There are four possibilities for notating the range of numbers 1, …, 10 without using ellipsis notation, which can be ambiguous:

  1. 1 ≤ i < 11
  2. 0 < i ≤ 10
  3. 0 < i < 11
  4. 1 ≤ i ≤ 10

There are several desirable properties that we might want our system of notation to have. For example:

  • (a) the difference between the bounds is equal to the number of terms in the sequence
  • (b) two ranges are adjacent if the upper bound of one is equal to the lower bound of the other.

Both 1 and 2 satisfy these properties, but 3 and 4 don’t. If we want to use our notation to specify a range of natural numbers, then it seems sensible to ask for the following two properties:

  • (c) the lower bound is a natural number
  • (d) the upper bound is a natural number, even when defining the empty sequence

Property (c) is only satisfied by 1 and 4 and property (d) is only satisfied by 1 and 3 So we see that it is only notation 1 that satisfied all of the properties we desire.

Now we move on to the question of whether we should start indexing at 0 or at 1. This is simply an issue of aesthetics. If we want to describe the range of an array with N entries, then by starting at 0 we would write the range as 0 ≤ i < N, whereas if we start indexing at 1 then it is written as 1 ≤ i < N+1, which is significantly more ugly. So we see that we should start indexing at 0. Note that programming languages such as C, Java and Python all follow this convention, as do most widely adopted languages. The only commonly used languages I can think of that don’t use it are Fortran and R.

Quick post to mention some of the more useful Mac OS X keyboard shortcuts in Google Chrome, to make your browsing simpler, faster and more effective. More as a reminder to myself than anything else, because I keep reading these, swearing that I will remember them, and then not doing it. Many of these come down to making Chrome behave more like a decent text editor, à la emacs or vim.

Tabs and windows

+T opens a new tab

+W closes the current tab

+shift+T opens the last tab you closed (Chrome remembers the last ten tabs you had open)

+alt+left and +alt+right move left and right along your open tabs

+H hides Chrome


+L jumps to the address bar

+alt+F jumps to the address bar and prefixes a ‘?’ so that whatever text you enter will be interpreted as a Google search (on Windows you can do this by pressing ⌘-K — I wonder why it’s more complicated on a Mac?)

⌘+D bookmarks the current page

Webpage shortcuts

⌘+ and ⌘- change the zoom level of the page; use ⌘+0 (zero) to return to default zoom

⌘+F find text within the current page

⌘+E use the current highlight to find text within the page

Text editing

alt+left and alt+right move the cursor left and right by words

+left and +right move the cursor to the beginning or end of the line

Hold down shift to highlight the characters that the cursor passes through.

About me

Proto-hacker, ex-mathematician and aspiring flaneur. Now living in London and making my living from algorithmic trading.


  • RT @JustinWolfers: The stock market is still rising if you measure its true value in bitcoin rather than artificial fiat currency. 2 weeks ago