Sunday, November 27, 2011

Dealing with Non-Positive Definite Matrices in R

Last time we looked at the Matrix package and dug a little into the chol(), Cholesky Decomposition, function.  I noted that often in finance we do not have a positive definite (PD) matrix.  The chol() function in both the Base and Matrix package requires a PD matrix.

In this post, we will look at why we don't always have a PD matrix, why are standard Cholesky implementations requiring a PD matrix, and what can we do?

So what do we not often have a PD matrix?  In finance we often deal with a large number of variables.  If the number of variables is larger than the number of observations, then the determinant of the correlation matrix will necessarily be 0.  A PD matrix is defined as a symmetric matrix S (n x n) such that all possible values of a vector X (n x 1) results in X`* S * X > 0.  Without going through the proof, this can only happen with a full rank matrix, which by definition has a determinant > 0.

Relaxing the strict inequality to X` * S * X >= 0 gives us what we call a positive semidefinite (PSD) matrix.  Luckily, a well formed correlation matrix, even is calculated above where the number of observations is less than the number of variables, will be PSD.  A gotcha is that sometimes floating point error or missing values in our time series can cause the resulting correlation to not be PSD.  We will talk about this in a later post.  Back on topic, we need a way to either "fix" the PSD matrix to make it PD or a way to decompose a PSD matrix.

First, let's look at the algorithm(s) used to calculate a Cholesky Decomposition.  Most modern Cholesky implementations take an iterative or recursive approach.  This does a good job describing it.  This block approach works well when you have implemented vectorization in your matrix multiplication.  The block size is usually chosen such that the inner Cholesky call can operate inside the CPU cache.  Wikipedia gives a good description of a single threaded method (these are really the same thing, the single threaded method just chooses a block size of 1).

Why does this require a PD matrix?  If you notice, in both the block scheme or the single threaded scheme, there is an inverse required.  The inverse of a matrix is not defined if that matrix is not of full rank.  If a matrix is PSD, then some eigenvalues are 0.  The  diagonal elements are a function of the eigenvalue and the column below the diagonal is a function of the inverse of the diagonal.  You cannot take the inverse of 0.

So let's look at a single threaded function for taking the Cholesky of a PSD matrix:

my_chol_psd = function(a){
   n = dim(a)[1];
   root = matrix(0,n,n);
 
   for (i in 1:n){
      sum = 0;
      if (i>1){
         sum = sum(root[i,1:(i-1)]^2);
      }
   
      x = a[i,i] - sum;
   
      if (x<0){
      x = 0;
      }
   
      root[i,i] = sqrt(x);
   
      if (i < n){
         for (j in (i+1):n){
       
            if (root[i,i] == 0){
               x=0;
            }
            else{
               sum = 0;
               if (i>1) {
         sum = root[i,1:(i-1)] %*% t(t(root[j,1:(i-1)]))
               }
               x = (a[i,j] - sum)/root[i,i];
            }
       
            root[j,i] = x;
         }
      }
   }
   return(root);
}
EDIT: I added an additional t() call in the function from what I originally had.  I do not know why t(root[1,1:2]) returns a 1x2 matrix.  Mind boggling.

You will notice that this is the Wikipedia implementation but checking for conditions that would cause an error. If that diagonal is <=0 then we set it and the column below it to 0. Let's do a quick check to see if it works.
> x = matrix(.9,3,3)
> x[1,1] = 1
> x[2,2] = 1
> x[3,3] = 1
> x[1,2] = 1
> x[2,1] = 1
> x
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
> det(x)
[1] 0
> L = my_chol_psd(x)
> L
     [,1] [,2]      [,3]
[1,]  1.0    0 0.0000000
[2,]  1.0    0 0.0000000
[3,]  0.9    0 0.4358899
> xx = L %*% t(L)
> xx
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
This seems to work. It handles a PSD matrix and returns the lower triangle. The L*L`, in fact, gives us back the original matrix.

We can use this as the base of the block method.  We have one other issue in the block method we need to address; the inverse matrix call.  Luckily there is a psuedo-inverse routine that we can use (here is another good discussion) in the MASS package.
my_chol = function(a,b){
   n = dim(a)[1];
   out = a;
   x = floor(n/b-1)*b;

   i=1
   while(i<x){
       out[i:(i+b-1),i:(i+b-1)] = my_chol_psd(out[i:(i+b-1),i:(i+b-1)]);
       out[(i+b):n,i:(i+b-1)] = out[(i+b):n,i:(i+b-1)] %*% ginv(t(out[i:(i+b-1),i:(i+b-1)]));
       out[(i+b):n,(i+b):n] = out[(i+b):n,(i+b):n] - out[(i+b):n,i:(i+b-1)]%*%t(out[(i+b):n,i:(i+b-1)]);
 
       i = i + b;
   }
   out[(x+1):n,(x+1):n] =my_chol_psd(out[(x+1):n,(x+1):n]);
   for (i in 1:(n-1)){
      out[i,(i+1):n] = rep(0,n-i);
   }
   return(out);
}
This function requires us to pass a parameter b, which is the block size.  You will have to tune this to your computer for performance.  Let's try out this new function as before:
> x
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
> L = my_chol(x,2)
> L
     [,1] [,2]      [,3]
[1,]  1.0    0 0.0000000
[2,]  1.0    0 0.0000000
[3,]  0.9    0 0.4358899
> xx = L%*%t(L)
> xx
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
>
Good. Performance wise, this will not perform as well as the chol() function. The single threaded method is not going to be as fast because we are not operating in a vectorized way. The ginv() method performs an order of magnitude slower than the real inverse (solve()). But it allows us to use a PSD matrix in situations where we need a Cholesky matrix.

We will look at ways to deal with ill-formed matrices (i.e. non-PSD) next time.  We will look at the nearPD function and implement a nearPSD() function.

Thursday, November 24, 2011

Matrix Package Doodling

Trying not to fall into Thanksgiving Day, football, coma.  So I started looking at the Matrix package.

Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.
n = 4000
c = Matrix(.9,n,n)
for(i in 1:n){
   c[i,i] = 1;
}
That took... a LONG time. What the hell?  Reading the intro document, it looks like there are multiple internal classes that hold matrices.  I'm guessing that every time I update the matrix it tries to decide what class should be used.  That could make doing matrix operations, painful.

This however works MUCH faster -- probably because the matrix is only checked once on creation.
n = 4000
c = matrix(.9,n,n)
for(i in 1:n){
   c[i,i] = 1
}
c = Matrix(c)
What does the internal structure of this look like?
str(c)

Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
  ..@ x       : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
  ..@ Dim     : int [1:2] 4000 4000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ factors : list()
The document referenced above lists a dsyMatrix as "Symmetric real matrices in non-packed storage." There is a type "dpoMatrix Positive semi-de nite symmetric real matrices in non-packed storage." Technically, this matrix is PSD. Why is it not a dpoMatrix?

Let's recalculate the Cholesky, look at the performance, and re-check the class:
gflops = function(ops,time){
   fps = ops/time
   return(fps/1000000000)
}

start = Sys.time()
root = chol(c)
end = Sys.time()
gflops(n^3/3,as.numeric(end - start))
[1] 1.584472
str(c)
Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
  ..@ x       : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
  ..@ Dim     : int [1:2] 4000 4000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ factors :List of 1
  .. ..$ Cholesky:Formal class 'Cholesky' [package "Matrix"] with 5 slots
  .. .. .. ..@ x       : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 ...
  .. .. .. ..@ Dim     : int [1:2] 4000 4000
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : NULL
  .. .. .. ..@ uplo    : chr "U"
  .. .. .. ..@ diag    : chr "N"
str(root)
Formal class 'Cholesky' [package "Matrix"] with 5 slots
  ..@ x       : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 ...
  ..@ Dim     : int [1:2] 4000 4000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ diag    : chr "N"

Matrix class didn't change and the performance is worse than the base class.

Notice how the Cholesky root is stored on the c matrix? This means we can get back to it if needed. That's pretty cool.

Unfortunately, the chol() function in Matrix and base requires a positive definite (PD) matrix.  Often in finance, I only have positive semi-definite (PSD) matrices.  We'll talk about that next time.

Sunday, November 20, 2011

Matrix Performance in R

I've been working on an example of the new Graph Template Language from SAS.  As I don't have direct access to SAS 9.2, I've been developing via email with a friend that does.

In the meantime, I thought I would start to investigate some of the performance properties of R.  I work in the financial risk industry and I often find myself worrying about performance of linear algebra systems.  Often I find myself setting up a simulation environment and 2 things dominate the performance -- the Cholesky root of the covariance matrix and simple matrix multiplication.

The Cholesky root (L) of a matrix (∑) is a decomposition scheme where

∑ = L * L`

We can think of this as a loose analog to the square root of a matrix.  This is useful for simulation purposes because if

u ~ N(0,I(n))

where I(n) is the identity matrix of size n x n and ~ N( ) represents a multivariate normal distribution

If  ∑ = L * L` then
L * u ~ N( 0, ∑ )

So how fast can R calculate the Cholesky decomposition of ∑?  

First let's create a correlation matrix of size 4000.
n = 4000
corr = matrix(.9,n,n)
for(i in 1:n){


corr[i,i] = 1;

};
This creates a 4000x4000 correlation matrix with .9 correlation between all members.  So let's calculate the Cholesky root using the chol() function and see how long it takes.  Let's also see what gets returned.
start = Sys.time()
root = chol(corr)
end = Sys.time()
end - start
root[1:5,1:5]
That gives us the return values
Time difference of 12.383 secs

       [,1]            [,2]            [,3]            [,4]               [,5]
[1,]    1 0.9000000 0.9000000 0.9000000 0.90000000
[2,]    0 0.4358899 0.2064742 0.2064742 0.20647416
[3,]    0 0.0000000 0.3838859 0.1233919 0.12339191
[4,]    0 0.0000000 0.0000000 0.3635146 0.08842247
[5,]    0 0.0000000 0.0000000 0.0000000 0.35259655
Unfortunately, this is the L` matrix, and we need the L matrix.  So we can transpose the result with the t() function.
start = Sys.time()
root = t(chol(corr))
end = Sys.time()
end - start
Time difference of 12.42 secs
So the transpose function is quick.  How many gigaflops is that?  Let's create a function for that.
gflops = function(ops,time){


fps = ops/time
return(fps/1000000000)

}
 It takes approximately N^3 / 3 operations to calculate a Cholesky.  So...
gflops(n^3/3,as.numeric(end-start))
[1] 1.71766 
I'm running on a fairly new Intel CORE i5 laptop.  That's pretty good. Note that we have to convert the time difference to a numeric value.

So for matrix multiplication, we will deal with square matrices.  For that reason, if A and B are square (n x n) matrices, then A*B has 2*n^3 - n^2 operations.  Let's create 2 random 4000 x 4000 matrices, multiply them together and see what the time and gflops are.
a = matrix(rnorm(n^2),n,n)
b = matrix(rnorm(n^2),n,n)
start = Sys.time()
xx = a %*% b
end = Sys.time()
end - start
gflops(2*4000^3 - 4000^2,as.numeric(end-start))
Time difference of 1.034383 mins

[1] 123.7298

That seems a tad high.  What does as.numeric() return if the time difference is in minutes?
as.numeric(end - start)
[1] 1.034383 
That's annoying.  If the time difference is > 1 minute, then you get the time in minutes, not seconds.  Really annoying.  Dividing 123.7298 / 60 = 2.0622.

Overall not too bad.  There is a package called Matrix that extends the base matrix functionality.  I haven't looked into that yet.  Next post I will explore that package and see if we can find ways to speed up these benchmarks.

Wednesday, November 2, 2011

First thoughts on R

Having worked just a little with R, I have some first impressions to share.  I'll give you some links to resources I found helpful with writing the previous project.

First, the documentation is not very good.  I struggled on previous attempts to figure things out.  I still find it crap shoot when I Google, looking for an answer.  Luckily I found a book with a good primer on things.

Second, really, foo.bar is an acceptable variable name?!?  For the longest time I thought I was missing something looking at examples until I realized that "bar" was not a property on an object "foo."  foo.bar is just a variable/object name and if you have another object named "foo" the two are not related.  Still makes my head hurt.

2.5 -- I can access properties on an object with $.  Seems like an odd choice.

6 Data frames are neat.  This took me longer to understand because of #2 than it should have.  To me, it really looks like a hash object containing arrays of equal length.  It helped to to think of them as plain "object" types from Actionscript -- accessing columns in multiple ways like DF$column, or DF[,"column"] is very similar to obj.property or obj["property"] is AS.

5  I still don't understand the mechanics behind the scenes though.  Why when I accessed properties in the Time Series object did I have to use obj[,"property"] instead of obj$property? When I look at the structure (str() function -- I like) the types are different.  Something is going on, but I don't know what.

5.1 How does scoping work with variables in functions? If I don't delete variables inside the function, do they get garbage collected?

6  Many ways to do things.  The two packages I used relied on 2 different types of time series collections.  That threw me for a loop.  Luckily I got it to work.  In one iteration of code I was casting things back and forth using the as.<type>() functions.

7 head() and str() functions are great.  Wish I knew about them when I started.  I saw them in some examples about half way though the project.

8 R needs better default output and publishing tools.  I can easily get information overload from SAS and have it in nicely formatted HTML, PDF, RTF, XML, Excel, etc.  I would love to see an equivalent in R.

Well, I ranted enough on that.

Here is the honor roll of helpful resources:

  1. Option Pricing and Estimation of Financial Models with R  Stefano M Iacus.  I actually ready this cover to cover on my plane trip last week.  The appendix was the most helpful tool I had for the project.
  2. Time Series Analysis with R -Part I, Walter Zucchini, Oleg Nenadi´c.  I need to reread this free PDF.  I skimmed it and it's examples helped me learn how charting works.
  3. Producing Simple Graphs with R Frank McCown.  Another helpful site for graphics.
  4. Quick-R.  Excepts from the book R in Action.  I plan on buying this in the near future.
  5. I also have The R Book by Michael J. Crawley.  The sheer size of this is daunting, but I am hoping that it will become a go to resource (and maybe answer some of my questions).  It didn't make the trip with me because of it's size.