Instead of:
Use this:function f(x) {sum = 0;for (i in seq(1,x)) sum = sum + ireturn(sum)}
If I time it, we see:f2 = function(x){return(sum(seq(x)))}
Nice! Spread that across 3 CPUs and we can bring it down a bit:system.time( (out = apply(as.array(seq(10000)),1,f2)))user system elapsed0.35 0.05 0.39
Not too shabby. How fast can we do this in SAS:system.time( (out2 = foreach(i=seq(0,9),.combine='c') %dopar% {apply(as.array(seq(i*1000+1,(i+1)*1000)),1,f2)}))user system elapsed0.02 0.00 0.26
SAS on a single CPU is just as fast as R on 3. It's not worth attempting to multi-thread this in SAS. The overhead would be too much as SAS/CONNECT is made for bigger problems.options cmplib=work.fns;proc fcmp outlib=work.fns.fns;function csum(x);sum = 0;do i=1 to x;sum = sum+i;end;return (sum);endsub;run;data _null_;do i=1 to 10000;x = csum(i);end;run;NOTE: DATA statement used (Total process time):real time 0.24 secondscpu time 0.25 seconds
So what about NumPY in Python? If we use the version compiled with MKL we ought to be able to do reduction in blazing fast time. MKL should use the SSE registers on the processor. Further, we'll use the "fromfunction" method that lets us pass a lambda to the array creation method.
import numpy as npimport time as timedef f(x,y):x = x +1return(np.cumsum(x))s = time.time()y = np.fromfunction(f,(10000,1))el = time.time() - sprint "%0.6f" % el
0.002000
proc fcmp compiles the function. try compiling the R funciton as well. SAS might still be faster though but the margin might narrow.
ReplyDeleteI'll give it a shot and report back. It is worth noting that FCMP doesn't compile until run time. It is stored as XML in a data set -- in the case of above WORK.FNS.
Deleteusing the compiler package I see:
Delete> cf = cmpfun(f)
> cf2 = cmpfun(f2)
> system.time( (out = apply(as.array(seq(10000)),1,f)))
user system elapsed
25.38 0.00 25.46
> system.time( (out = apply(as.array(seq(10000)),1,cf)))
user system elapsed
5.67 0.00 5.68
> system.time( (out = apply(as.array(seq(10000)),1,f2)))
user system elapsed
0.33 0.05 0.37
> system.time( (out = apply(as.array(seq(10000)),1,cf2)))
user system elapsed
0.39 0.00 0.39
Compiling speeds up the inefficient f function. It does nothing for the f2 version. I expect this is because the sum and seq functions are already compiled.
How about:
ReplyDeletef3 <- function(x) tail(cumsum(1:x), 1)
'cumsum' is C-compiled and fast...
Greets,
Andrej
Contrived benchmarks are contrived.
ReplyDeleteIf you must compare this way use equivalent functions in both languages.
The raw cumsum() function has speed comparable to numpy.cumsum().
Other things:
Use 1:x, which is faster than seq(x)
Use sapply() rather than apply (saving the as.array function).
Here the slow part is apply
ReplyDeletesystem.time( cumsum(1:10000))
user system elapsed
0.001 0.000 0.001
This has to be one of the worst blogs I've seen. For goodness sake, as others have said: cumsum() is a basic function in R. Suggest withdrawing the article until you learn to check your facts before blogging?
ReplyDeleteHow about read the first post where I'm new to R, am trying to learn, and am using this blog as I pick my way through it. I appreciate the constructive comments above as they help me and they help those that read them. If you are not a fan, please feel free not to read.
Deleteyeah..dont worry about detractors like the guy above...keep doing what you are doing. you make mistakes and then people help you figure out how to get better.
Deletegood luck!
Andrej, Shap, and Victor, thank you for the comments. I appreciate them.
ReplyDeleteAndrej's function using an apply is comparable to the SAS and Python versions which both are looping over the 10,000 values.
> f3 <- function(x) tail(cumsum(1:x), 1)
> system.time( (out=sapply(1:10000,f3)))
user system elapsed
0.61 0.00 0.61
> f4 = function(x) sum(1:x)
> system.time( (out=sapply(1:10000,f4)))
user system elapsed
0.19 0.00 0.19
And yes it is contrived, but it was contrived to see what I could see. The cumsum() is obvious choice as it solves the problem very quickly (thanks Victor for pointing it out).
Thanks for reading.
I appreciate that you are new to R, but it is important not to try to present what you are doing as a performance comparison. But I also think you are wrong in your statement about loops: the Python version does not loop through. Since np.cumsum() (and your f) takes an array argument for x, Python happily passes through the full array, and does a single calculation step. Note that the returned array does not actually have the shape you specified of (10000, 1), but rather is a single dimensional array of shape (10000, )
DeleteAgree, it is apples and oranges. But that happens because of the vectorization through the SSE registers in MKL, not Python. The Python interpreter is single threaded and if you tried this on a non-optimized version, it would take about the same time as the R version. I tried to make that clear, but agree it could have been better stated.
DeleteRIght, but my point is that what you did in python was exactly equivalent to cumsum(1:10000) in R. You put a lot of extra syntax around what is really a single call of np.cumsum(range(10001)).
DeletePython:
Delete>>> def testcumsum(x):
... t = time.time()
... y = np.cumsum(range(x + 1))
... print "%0.6f" % (time.time() - t)
...
>>> testcumsum(1000000)
0.179632
R:
> testcumsum <- function(x){system.time(cumsum(as.numeric(1:x)))}
> testcumsum(1000000)
user system elapsed
0.072 0.001 0.074
The point of using fromfunction is that the f function is applied to each index duruing the array creation. That is why it is analogous to *apply() in R.
DeleteFor what it's worth; MKL makes the fromfunction more efficient that what you show:
>>> def testcumsum(x):
... t = time.time()
... y = np.cumsum(range(x+1))
... print "%0.6f" % (time.time() - t)
...
>>> testcumsum(10000)
0.000000
>>> testcumsum(1000000)
0.170000
>>> t = time.time(); y = np.fromfunction(f,(1000000,1)); el = time.time() - t
>>> el
0.029999971389770508
> testcumsum <- function(x){system.time(cumsum(as.numeric(1:x)))}
> testcumsum(1000000)
user system elapsed
0.04 0.00 0.05
Again, np.cumsum is being called only once in your code. Turns out range(x) is a lot slower than np.indices((x,)) (used internally by from function) for this use case.
ReplyDelete>>> t = time.time(); y = np.cumsum(np.indices((1000000,))); el = time.time()-t
>>> el
0.01994800567626953
So yes, python is still faster for this*, but not for the reasons you are suggesting.
*This is why I don't write performance comparisons when learning a language...
I stand corrected.
DeleteI actually find it useful to do these exercises. Learning the basics is one thing, learning to code efficently is another. Discussions like this are helpful; they provide a learning tool for myself and other new users.
I really do appreciate the time you spent to prove me wrong. I gained insight from it and hope others do as well.
SAS also has a matrix language. The SAS/IML syntax is similar to your R example:
ReplyDeleteproc iml;
start f2(n);
return(sum(1:n));
finish;
t0 = time();
do i = 1 to 10000;
x = f2(i);
end;
t = time()-t0;
print t;
Funny, I link to your book on the post before this one!
Delete