In the meantime, I thought I would start to investigate some of the performance properties of R. I work in the financial risk industry and I often find myself worrying about performance of linear algebra systems. Often I find myself setting up a simulation environment and 2 things dominate the performance -- the Cholesky root of the covariance matrix and simple matrix multiplication.

The Cholesky root (L) of a matrix (∑) is a decomposition scheme where

∑ = L * L`

We can think of this as a loose analog to the square root of a matrix. This is useful for simulation purposes because if

u ~ N(0,I(n))

where I(n) is the identity matrix of size n x n and ~ N( ) represents a multivariate normal distribution

If ∑ = L * L` then

L * u ~ N( 0, ∑ )

So how fast can R calculate the Cholesky decomposition of ∑?

First let's create a correlation matrix of size 4000.

n = 4000

corr = matrix(.9,n,n)

for(i in 1:n){

corr[i,i] = 1;

};

This creates a 4000x4000 correlation matrix with .9 correlation between all members. So let's calculate the Cholesky root using the chol() function and see how long it takes. Let's also see what gets returned.

start = Sys.time()That gives us the return values

root = chol(corr)

end = Sys.time()

end - start

root[1:5,1:5]

Time difference of 12.383 secsUnfortunately, this is the L` matrix, and we need the L matrix. So we can transpose the result with the t() function.

[,1] [,2] [,3] [,4] [,5]

[1,] 1 0.9000000 0.9000000 0.9000000 0.90000000

[2,] 0 0.4358899 0.2064742 0.2064742 0.20647416

[3,] 0 0.0000000 0.3838859 0.1233919 0.12339191

[4,] 0 0.0000000 0.0000000 0.3635146 0.08842247

[5,] 0 0.0000000 0.0000000 0.0000000 0.35259655

start = Sys.time()

root = t(chol(corr))

end = Sys.time()

end - start

Time difference of 12.42 secsSo the transpose function is quick. How many gigaflops is that? Let's create a function for that.

gflops = function(ops,time){It takes approximately N^3 / 3 operations to calculate a Cholesky. So...

fps = ops/time

return(fps/1000000000)

}

gflops(n^3/3,as.numeric(end-start))

[1] 1.71766I'm running on a fairly new Intel CORE i5 laptop. That's pretty good. Note that we have to convert the time difference to a numeric value.

So for matrix multiplication, we will deal with square matrices. For that reason, if A and B are square (n x n) matrices, then A*B has 2*n^3 - n^2 operations. Let's create 2 random 4000 x 4000 matrices, multiply them together and see what the time and gflops are.

a = matrix(rnorm(n^2),n,n)

b = matrix(rnorm(n^2),n,n)

start = Sys.time()

xx = a %*% b

end = Sys.time()

end - start

gflops(2*4000^3 - 4000^2,as.numeric(end-start))

Time difference of 1.034383 mins

[1] 123.7298

That seems a tad high. What does as.numeric() return if the time difference is in minutes?

as.numeric(end - start)

[1] 1.034383That's annoying. If the time difference is > 1 minute, then you get the time in minutes, not seconds. Really annoying. Dividing 123.7298 / 60 = 2.0622.

Overall not too bad. There is a package called Matrix that extends the base matrix functionality. I haven't looked into that yet. Next post I will explore that package and see if we can find ways to speed up these benchmarks.

## No comments:

## Post a Comment