Sunday, June 17, 2012

Comparing performance in R, foreach/doSNOW, SAS, and NumPY (MKL)

This is a follow up to my previous post.  There is a quicker way to compute the function I created (basic cumulative sum) in R.

Instead of:
function f(x) {
   sum = 0;
   for (i in seq(1,x)) sum = sum + i
   return(sum)
}
Use this:
f2 = function(x){
   return(sum(seq(x)))
}
If I time it, we see:
system.time( (out = apply(as.array(seq(10000)),1,f2)))
   user  system elapsed
   0.35    0.05    0.39
Nice!  Spread that across 3 CPUs and we can bring it down a bit:
system.time( (out2 =  foreach(i=seq(0,9),.combine='c') %dopar% {
 apply(as.array(seq(i*1000+1,(i+1)*1000)),1,f2)
}))
   user  system elapsed
   0.02    0.00    0.26
Not too shabby.  How fast can we do this in SAS:
options cmplib=work.fns;
proc fcmp outlib=work.fns.fns;
function csum(x);
  sum = 0;
  do i=1 to x;
     sum = sum+i;
  end;
  return (sum);
endsub;
run;

data _null_;
do i=1 to 10000;
      x = csum(i);
end;
run;

NOTE: DATA statement used (Total process time):
      real time           0.24 seconds
      cpu time            0.25 seconds
SAS on a single CPU is just as fast as R on 3.  It's not worth attempting to multi-thread this in SAS.  The overhead would be too much as SAS/CONNECT is made for bigger problems.

So what about NumPY in Python?  If we use the version compiled with MKL we ought to be able to do reduction in blazing fast time.  MKL should use the SSE registers on the processor.  Further, we'll use the "fromfunction" method that lets us pass a lambda to the array creation method.
import numpy as np
import time as time

def f(x,y):
   x = x +1
   return(np.cumsum(x))

s = time.time()

y = np.fromfunction(f,(10000,1))
  
el = time.time() - s

print "%0.6f" % el
 0.002000
FAST.

19 comments:

  1. proc fcmp compiles the function. try compiling the R funciton as well. SAS might still be faster though but the margin might narrow.

    ReplyDelete
    Replies
    1. I'll give it a shot and report back. It is worth noting that FCMP doesn't compile until run time. It is stored as XML in a data set -- in the case of above WORK.FNS.

      Delete
    2. using the compiler package I see:

      > cf = cmpfun(f)
      > cf2 = cmpfun(f2)
      > system.time( (out = apply(as.array(seq(10000)),1,f)))
      user system elapsed
      25.38 0.00 25.46
      > system.time( (out = apply(as.array(seq(10000)),1,cf)))
      user system elapsed
      5.67 0.00 5.68
      > system.time( (out = apply(as.array(seq(10000)),1,f2)))
      user system elapsed
      0.33 0.05 0.37
      > system.time( (out = apply(as.array(seq(10000)),1,cf2)))
      user system elapsed
      0.39 0.00 0.39

      Compiling speeds up the inefficient f function. It does nothing for the f2 version. I expect this is because the sum and seq functions are already compiled.

      Delete
  2. How about:

    f3 <- function(x) tail(cumsum(1:x), 1)

    'cumsum' is C-compiled and fast...

    Greets,
    Andrej

    ReplyDelete
  3. Contrived benchmarks are contrived.
    If you must compare this way use equivalent functions in both languages.
    The raw cumsum() function has speed comparable to numpy.cumsum().

    Other things:
    Use 1:x, which is faster than seq(x)
    Use sapply() rather than apply (saving the as.array function).

    ReplyDelete
  4. Here the slow part is apply

    system.time( cumsum(1:10000))
    user system elapsed
    0.001 0.000 0.001

    ReplyDelete
  5. This has to be one of the worst blogs I've seen. For goodness sake, as others have said: cumsum() is a basic function in R. Suggest withdrawing the article until you learn to check your facts before blogging?

    ReplyDelete
    Replies
    1. How about read the first post where I'm new to R, am trying to learn, and am using this blog as I pick my way through it. I appreciate the constructive comments above as they help me and they help those that read them. If you are not a fan, please feel free not to read.

      Delete
    2. yeah..dont worry about detractors like the guy above...keep doing what you are doing. you make mistakes and then people help you figure out how to get better.

      good luck!

      Delete
  6. Andrej, Shap, and Victor, thank you for the comments. I appreciate them.

    Andrej's function using an apply is comparable to the SAS and Python versions which both are looping over the 10,000 values.
    > f3 <- function(x) tail(cumsum(1:x), 1)
    > system.time( (out=sapply(1:10000,f3)))
    user system elapsed
    0.61 0.00 0.61
    > f4 = function(x) sum(1:x)
    > system.time( (out=sapply(1:10000,f4)))
    user system elapsed
    0.19 0.00 0.19

    And yes it is contrived, but it was contrived to see what I could see. The cumsum() is obvious choice as it solves the problem very quickly (thanks Victor for pointing it out).

    Thanks for reading.

    ReplyDelete
    Replies
    1. I appreciate that you are new to R, but it is important not to try to present what you are doing as a performance comparison. But I also think you are wrong in your statement about loops: the Python version does not loop through. Since np.cumsum() (and your f) takes an array argument for x, Python happily passes through the full array, and does a single calculation step. Note that the returned array does not actually have the shape you specified of (10000, 1), but rather is a single dimensional array of shape (10000, )

      Delete
    2. Agree, it is apples and oranges. But that happens because of the vectorization through the SSE registers in MKL, not Python. The Python interpreter is single threaded and if you tried this on a non-optimized version, it would take about the same time as the R version. I tried to make that clear, but agree it could have been better stated.

      Delete
    3. RIght, but my point is that what you did in python was exactly equivalent to cumsum(1:10000) in R. You put a lot of extra syntax around what is really a single call of np.cumsum(range(10001)).

      Delete
    4. Python:
      >>> def testcumsum(x):
      ... t = time.time()
      ... y = np.cumsum(range(x + 1))
      ... print "%0.6f" % (time.time() - t)
      ...
      >>> testcumsum(1000000)
      0.179632


      R:

      > testcumsum <- function(x){system.time(cumsum(as.numeric(1:x)))}
      > testcumsum(1000000)
      user system elapsed
      0.072 0.001 0.074

      Delete
    5. The point of using fromfunction is that the f function is applied to each index duruing the array creation. That is why it is analogous to *apply() in R.

      For what it's worth; MKL makes the fromfunction more efficient that what you show:

      >>> def testcumsum(x):
      ... t = time.time()
      ... y = np.cumsum(range(x+1))
      ... print "%0.6f" % (time.time() - t)
      ...
      >>> testcumsum(10000)
      0.000000
      >>> testcumsum(1000000)
      0.170000
      >>> t = time.time(); y = np.fromfunction(f,(1000000,1)); el = time.time() - t
      >>> el
      0.029999971389770508

      > testcumsum <- function(x){system.time(cumsum(as.numeric(1:x)))}
      > testcumsum(1000000)
      user system elapsed
      0.04 0.00 0.05

      Delete
  7. Again, np.cumsum is being called only once in your code. Turns out range(x) is a lot slower than np.indices((x,)) (used internally by from function) for this use case.

    >>> t = time.time(); y = np.cumsum(np.indices((1000000,))); el = time.time()-t
    >>> el
    0.01994800567626953

    So yes, python is still faster for this*, but not for the reasons you are suggesting.

    *This is why I don't write performance comparisons when learning a language...

    ReplyDelete
    Replies
    1. I stand corrected.

      I actually find it useful to do these exercises. Learning the basics is one thing, learning to code efficently is another. Discussions like this are helpful; they provide a learning tool for myself and other new users.

      I really do appreciate the time you spent to prove me wrong. I gained insight from it and hope others do as well.

      Delete
  8. SAS also has a matrix language. The SAS/IML syntax is similar to your R example:

    proc iml;
    start f2(n);
    return(sum(1:n));
    finish;

    t0 = time();
    do i = 1 to 10000;
    x = f2(i);
    end;
    t = time()-t0;
    print t;

    ReplyDelete
    Replies
    1. Funny, I link to your book on the post before this one!

      Delete