Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.
n = 4000That took... a LONG time. What the hell? Reading the intro document, it looks like there are multiple internal classes that hold matrices. I'm guessing that every time I update the matrix it tries to decide what class should be used. That could make doing matrix operations, painful.
c = Matrix(.9,n,n)
for(i in 1:n){
c[i,i] = 1;
}
This however works MUCH faster -- probably because the matrix is only checked once on creation.
n = 4000What does the internal structure of this look like?
c = matrix(.9,n,n)
for(i in 1:n){
c[i,i] = 1
}
c = Matrix(c)
str(c)The document referenced above lists a dsyMatrix as "Symmetric real matrices in non-packed storage." There is a type "dpoMatrix Positive semi-de nite symmetric real matrices in non-packed storage." Technically, this matrix is PSD. Why is it not a dpoMatrix?
Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
..@ x : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
..@ Dim : int [1:2] 4000 4000
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ uplo : chr "U"
..@ factors : list()
Let's recalculate the Cholesky, look at the performance, and re-check the class:
gflops = function(ops,time){
fps = ops/time
return(fps/1000000000)
}
start = Sys.time()
root = chol(c)
end = Sys.time()
gflops(n^3/3,as.numeric(end - start))
[1] 1.584472
str(c)
Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
..@ x : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
..@ Dim : int [1:2] 4000 4000
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ uplo : chr "U"
..@ factors :List of 1
.. ..$ Cholesky:Formal class 'Cholesky' [package "Matrix"] with 5 slots
.. .. .. ..@ x : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 ...
.. .. .. ..@ Dim : int [1:2] 4000 4000
.. .. .. ..@ Dimnames:List of 2
.. .. .. .. ..$ : NULL
.. .. .. .. ..$ : NULL
.. .. .. ..@ uplo : chr "U"
.. .. .. ..@ diag : chr "N"
str(root)
Formal class 'Cholesky' [package "Matrix"] with 5 slots
..@ x : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 ...
..@ Dim : int [1:2] 4000 4000
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ uplo : chr "U"
..@ diag : chr "N"
Matrix class didn't change and the performance is worse than the base class.
Notice how the Cholesky root is stored on the c matrix? This means we can get back to it if needed. That's pretty cool.
Unfortunately, the chol() function in Matrix and base requires a positive definite (PD) matrix. Often in finance, I only have positive semi-definite (PSD) matrices. We'll talk about that next time.
No comments:
Post a Comment