Wednesday, June 13, 2012

Performance with foreach, doSNOW, and snowfall

Is it just me, or does the performance of the foreach package with a doSNOW backend operating on a socket grid suck?

Here at work, I am helping to setup a cluster of Windows machines for distributed R processing.  We have lots of researchers running code that takes hours to complete and are essentially large for loops with lots of analysis in between.  These guys and gals are not hard core programmers, so there is lots of interest in foreach (as opposed to something like RMPI).

I have successfully setup a POC grid between mutliple machines using sockets and public key authentication.  Assuming we use this, I'll post a how-to, as there is not much on the web on how to get it working on Windows.

In the meantime, I am testing performance.  There is something going on with foreach that I do not understand.  Performance numbers are really bad.

Can anyone explain what is going on here?
> require(doSNOW)
Loading required package: doSNOW
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: snow
> require(snowfall)
Loading required package: snowfall
>
> sfInit(parallel=TRUE,socketHosts=rep("localhost",3))
R Version:  R version 2.15.0 (2012-03-30)
snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 3 CPUs.
> cl = sfGetCluster()
>
> f = function(x) {
+    sum = 0
+    for (i in seq(1,x)) sum = sum + i
+    return(sum)
+ }
>
> registerDoSNOW(cl)
>
> out = vector("logical",length=10000)
> system.time( (for (i in seq(1,10000)) out[i]=f(i) ))
   user  system elapsed
  25.99    0.00   25.99
>
> system.time( (out = lapply(seq(1,10000),f) ))
   user  system elapsed
  26.55    0.00   26.55
>
> system.time( (out = parLapply(cl,seq(1,10000),f) ))
   user  system elapsed
   0.02    0.00   15.85
>
> system.time( (out = foreach(i=seq(1,10000)) %dopar% f(i) ))
   user  system elapsed
   6.64    0.42   98.31
>
> getDoParWorkers()
[1] 3
EDIT: HA!  Figured it out.  foreach is not very efficient in communicating tasks as compared to par*apply().  The time to communicate the process overwhelmed the actual processing time.

When I change the code to this, it runs fast (about the same as parLapply()):


> system.time( (out = foreach(i=seq(0,9),.combine='c') %dopar% {
+    apply(as.array(seq(i*1000+1,(i+1)*1000)),1,f)
+ }))
   user  system elapsed
   0.00    0.00   14.03

No comments:

Post a Comment