I haven't been here in a while -- previous job did not allow
blogging.
Over beers the other night with some friends and alumni from
SAS, someone mentioned IML is now multithreaded. Someone else pipes up with,
"yeah but the Cholesky root() function is not and that's all I really use
it for."
To which I respond, "A -- I didn't know that about
multithreading in SAS 9.4. Very cool. B -- I bet I can write
a faster root() function for you." What hubris! I blame the
beer.
So having thrown down the gauntlet, I went back to my hotel
room, dug out old posts (here), and got to work.
Here's what I came up with:
start my_chol(a,b);
n =
nrow(a);
out =
a;
x =
floor(n/b-1)*b;
i=1;
do
while(i<x);
out[i:(i+b-1),i:(i+b-1)]
= root(out[i:(i+b-1),i:(i+b-1)])`;
out[(i+b):n,i:(i+b-1)]
= out[(i+b):n,i:(i+b-1)] * inv((out[i:(i+b-1),i:(i+b-1)])`);
out[(i+b):n,(i+b):n] = out[(i+b):n,(i+b):n]
- out[(i+b):n,i:(i+b-1)]*(out[(i+b):n,i:(i+b-1)])`;
i
= i + b;
end;
out[(x+1):n,(x+1):n]
=root(out[(x+1):n,(x+1):n])`;
do
i=1 to n-1;
out[i,(i+1):n]
= J(1,n-i,0);
end;
return(out);
finish my_chol;
This takes 2 parameters.
- A – The matrix to be decomposed.
- B – A block size. This trades off the multithreading in matrix multiplication with single threaded root() and inv() functions. You need to play with this to see what works best on your system. 100-150 seems to be the sweet spot for my laptop.
Running this on a 5000x5000 matrix (B=100), I get:
- IML root() – 21.08 seconds
- my_chol() – 13.98 seconds
I’ll collect my free beer later
this week J.