Sunday, November 27, 2011

Dealing with Non-Positive Definite Matrices in R

Last time we looked at the Matrix package and dug a little into the chol(), Cholesky Decomposition, function.  I noted that often in finance we do not have a positive definite (PD) matrix.  The chol() function in both the Base and Matrix package requires a PD matrix.

In this post, we will look at why we don't always have a PD matrix, why are standard Cholesky implementations requiring a PD matrix, and what can we do?

So what do we not often have a PD matrix?  In finance we often deal with a large number of variables.  If the number of variables is larger than the number of observations, then the determinant of the correlation matrix will necessarily be 0.  A PD matrix is defined as a symmetric matrix S (n x n) such that all possible values of a vector X (n x 1) results in X`* S * X > 0.  Without going through the proof, this can only happen with a full rank matrix, which by definition has a determinant > 0.

Relaxing the strict inequality to X` * S * X >= 0 gives us what we call a positive semidefinite (PSD) matrix.  Luckily, a well formed correlation matrix, even is calculated above where the number of observations is less than the number of variables, will be PSD.  A gotcha is that sometimes floating point error or missing values in our time series can cause the resulting correlation to not be PSD.  We will talk about this in a later post.  Back on topic, we need a way to either "fix" the PSD matrix to make it PD or a way to decompose a PSD matrix.

First, let's look at the algorithm(s) used to calculate a Cholesky Decomposition.  Most modern Cholesky implementations take an iterative or recursive approach.  This does a good job describing it.  This block approach works well when you have implemented vectorization in your matrix multiplication.  The block size is usually chosen such that the inner Cholesky call can operate inside the CPU cache.  Wikipedia gives a good description of a single threaded method (these are really the same thing, the single threaded method just chooses a block size of 1).

Why does this require a PD matrix?  If you notice, in both the block scheme or the single threaded scheme, there is an inverse required.  The inverse of a matrix is not defined if that matrix is not of full rank.  If a matrix is PSD, then some eigenvalues are 0.  The  diagonal elements are a function of the eigenvalue and the column below the diagonal is a function of the inverse of the diagonal.  You cannot take the inverse of 0.

So let's look at a single threaded function for taking the Cholesky of a PSD matrix:

my_chol_psd = function(a){
   n = dim(a)[1];
   root = matrix(0,n,n);
 
   for (i in 1:n){
      sum = 0;
      if (i>1){
         sum = sum(root[i,1:(i-1)]^2);
      }
   
      x = a[i,i] - sum;
   
      if (x<0){
      x = 0;
      }
   
      root[i,i] = sqrt(x);
   
      if (i < n){
         for (j in (i+1):n){
       
            if (root[i,i] == 0){
               x=0;
            }
            else{
               sum = 0;
               if (i>1) {
         sum = root[i,1:(i-1)] %*% t(t(root[j,1:(i-1)]))
               }
               x = (a[i,j] - sum)/root[i,i];
            }
       
            root[j,i] = x;
         }
      }
   }
   return(root);
}
EDIT: I added an additional t() call in the function from what I originally had.  I do not know why t(root[1,1:2]) returns a 1x2 matrix.  Mind boggling.

You will notice that this is the Wikipedia implementation but checking for conditions that would cause an error. If that diagonal is <=0 then we set it and the column below it to 0. Let's do a quick check to see if it works.
> x = matrix(.9,3,3)
> x[1,1] = 1
> x[2,2] = 1
> x[3,3] = 1
> x[1,2] = 1
> x[2,1] = 1
> x
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
> det(x)
[1] 0
> L = my_chol_psd(x)
> L
     [,1] [,2]      [,3]
[1,]  1.0    0 0.0000000
[2,]  1.0    0 0.0000000
[3,]  0.9    0 0.4358899
> xx = L %*% t(L)
> xx
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
This seems to work. It handles a PSD matrix and returns the lower triangle. The L*L`, in fact, gives us back the original matrix.

We can use this as the base of the block method.  We have one other issue in the block method we need to address; the inverse matrix call.  Luckily there is a psuedo-inverse routine that we can use (here is another good discussion) in the MASS package.
my_chol = function(a,b){
   n = dim(a)[1];
   out = a;
   x = floor(n/b-1)*b;

   i=1
   while(i<x){
       out[i:(i+b-1),i:(i+b-1)] = my_chol_psd(out[i:(i+b-1),i:(i+b-1)]);
       out[(i+b):n,i:(i+b-1)] = out[(i+b):n,i:(i+b-1)] %*% ginv(t(out[i:(i+b-1),i:(i+b-1)]));
       out[(i+b):n,(i+b):n] = out[(i+b):n,(i+b):n] - out[(i+b):n,i:(i+b-1)]%*%t(out[(i+b):n,i:(i+b-1)]);
 
       i = i + b;
   }
   out[(x+1):n,(x+1):n] =my_chol_psd(out[(x+1):n,(x+1):n]);
   for (i in 1:(n-1)){
      out[i,(i+1):n] = rep(0,n-i);
   }
   return(out);
}
This function requires us to pass a parameter b, which is the block size.  You will have to tune this to your computer for performance.  Let's try out this new function as before:
> x
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
> L = my_chol(x,2)
> L
     [,1] [,2]      [,3]
[1,]  1.0    0 0.0000000
[2,]  1.0    0 0.0000000
[3,]  0.9    0 0.4358899
> xx = L%*%t(L)
> xx
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.9
[2,]  1.0  1.0  0.9
[3,]  0.9  0.9  1.0
>
Good. Performance wise, this will not perform as well as the chol() function. The single threaded method is not going to be as fast because we are not operating in a vectorized way. The ginv() method performs an order of magnitude slower than the real inverse (solve()). But it allows us to use a PSD matrix in situations where we need a Cholesky matrix.

We will look at ways to deal with ill-formed matrices (i.e. non-PSD) next time.  We will look at the nearPD function and implement a nearPSD() function.

No comments:

Post a Comment