I haven't been here in a while -- previous job did not allow blogging.
Over beers the other night with some friends and alumni from SAS, someone mentioned IML is now multithreaded. Someone else pipes up with, "yeah but the Cholesky root() function is not and that's all I really use it for."
To which I respond, "A -- I didn't know that about multithreading in SAS 9.4. Very cool. B -- I bet I can write a faster root() function for you." What hubris! I blame the beer.
So having thrown down the gauntlet, I went back to my hotel room, dug out old posts (here), and got to work.
Here's what I came up with:
n = nrow(a);
out = a;
x = floor(n/b-1)*b;
out[i:(i+b-1),i:(i+b-1)] = root(out[i:(i+b-1),i:(i+b-1)])`;
out[(i+b):n,i:(i+b-1)] = out[(i+b):n,i:(i+b-1)] * inv((out[i:(i+b-1),i:(i+b-1)])`);
out[(i+b):n,(i+b):n] = out[(i+b):n,(i+b):n] - out[(i+b):n,i:(i+b-1)]*(out[(i+b):n,i:(i+b-1)])`;
i = i + b;
do i=1 to n-1;
out[i,(i+1):n] = J(1,n-i,0);
This takes 2 parameters.
- A – The matrix to be decomposed.
- B – A block size. This trades off the multithreading in matrix multiplication with single threaded root() and inv() functions. You need to play with this to see what works best on your system. 100-150 seems to be the sweet spot for my laptop.
Running this on a 5000x5000 matrix (B=100), I get:
- IML root() – 21.08 seconds
- my_chol() – 13.98 seconds
I’ll collect my free beer later this week J.