High-Performance Computing: An overview
Parallel computing in R
Extended examples
Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:
Big data: How to work with data that doesn’t fit your computer
Parallel computing: How to take advantage of multiple core systems
Compiled code: Write your own low-level code (if R doesn’t has it yet…)
(Checkout CRAN Task View on HPC)
Buy a bigger computer/RAM memory (not the best solution!)
Use out-of-memory storage, i.e., don’t load all your data in the RAM. e.g. The bigmemory, data.table, HadoopStreaming R packages
Store it more efficiently, e.g.: Sparse Matrices (take a look at the dgCMatrix
objects from the Matrix R package)
Flynn’s Classical Taxonomy (Blaise Barney, Introduction to Parallel Computing, Lawrence Livermore National Laboratory)
We will be focusing on the Single Instruction stream Multiple Data stream
source: Blaise Barney, Introduction to Parallel Computing, Lawrence Livermore National Laboratory
source: Blaise Barney, Introduction to Parallel Computing, Lawrence Livermore National Laboratory
In raw terms
Supercomputer: A single big machine with thousands of cores/gpus.
High Performance Computing (HPC): Multiple machines within a single network.
High Throughput Computing (HTC): Multiple machines across multiple networks.
You may not have access to a supercomputer, but certainly HPC/HTC clusters are more accessible these days, e.g. AWS provides a service to create HPC clusters at a low cost (allegedly, since nobody understands how pricing works)
Ask yourself these questions before jumping into HPC!
While there are several alternatives (just take a look at the High-Performance Computing Task View), we’ll focus on the following R-packages for explicit parallelism:
parallel: R package that provides ‘[s]upport for parallel computation, including random-number generation’.
future: ‘[A] lightweight and unified Future API for sequential and parallel processing of R expression via futures.’
Rcpp + OpenMP: Rcpp is an R package for integrating R with C++, and OpenMP is a library for high-level parallelism for C/C++ and FORTRAN.
Others but not used here
And tools for implicit parallelism (out-of-the-box tools that allow the programmer not to worry about parallelization):
gpuR for Matrix manipulation using GPU
tensorflow an R interface to TensorFlow.
A ton of other type of resources, notably the tools for working with batch schedulers such as Slurm, and HTCondor.
U ready for speed?!?!?!
(Usually) We do the following:
Create a PSOCK/FORK
(or other) cluster using makePSOCKCluster
/makeForkCluster
(or makeCluster
)
Copy/prepare each R session (if you are using a PSOCK
cluster):
Copy objects with clusterExport
Pass expressions with clusterEvalQ
Set a seed
Do your call: parApply
, parLapply
, etc.
Stop the cluster with clusterStop
Can be created with makePSOCKCluster
Creates brand new R Sessions (so nothing is inherited from the master), e.g.
# This creates a cluster with 4 R sessions
cl <- makePSOCKCluster(4)
Child sessions are connected to the master session via Socket connections
Can be created outside of the current computer, i.e. across multiple computers!
Fork Cluster makeForkCluster
:
Uses OS Forking,
Copies the current R session locally (so everything is inherited from the master up to that point).
Data is only duplicated if it is altered (need to double check when this happens!)
Not available on Windows.
Other makeCluster
: passed to snow (Simple Network of Workstations)
makePSOCKCluster
# 1. CREATING A CLUSTER
library(parallel)
nnodes <- 4L
cl <- makePSOCKcluster(nnodes)
# 2. PREPARING THE CLUSTER
clusterSetRNGStream(cl, 123) # Equivalent to `set.seed(123)`
# 3. DO YOUR CALL
ans <- parSapply(cl, 1:nnodes, function(x) runif(1e3))
(ans0 <- var(ans))
## [,1] [,2] [,3] [,4]
## [1,] 0.0861888293 -0.0001633431 5.939143e-04 -3.672845e-04
## [2,] -0.0001633431 0.0853841838 2.390790e-03 -1.462154e-04
## [3,] 0.0005939143 0.0023907904 8.114219e-02 -4.714618e-06
## [4,] -0.0003672845 -0.0001462154 -4.714618e-06 8.467722e-02
Making sure is reproducible
# I want to get the same!
clusterSetRNGStream(cl, 123)
ans1 <- var(parSapply(cl, 1:nnodes, function(x) runif(1e3)))
# 4. STOP THE CLUSTER
stopCluster(cl)
all.equal(ans0, ans1) # All equal!
## [1] TRUE
makeForkCluster
In the case of makeForkCluster
# 1. CREATING A CLUSTER
library(parallel)
# The fork cluster will copy the -nsims- object
nsims <- 1e3
nnodes <- 4L
cl <- makeForkCluster(nnodes)
# 2. PREPARING THE CLUSTER
clusterSetRNGStream(cl, 123)
# 3. DO YOUR CALL
ans <- do.call(cbind, parLapply(cl, 1:nnodes, function(x) {
runif(nsims) # Look! we use the nsims object!
# This would have fail in makePSOCKCluster
# if we didn't copy -nsims- first.
}))
(ans0 <- var(ans))
## [,1] [,2] [,3] [,4]
## [1,] 0.0861888293 -0.0001633431 5.939143e-04 -3.672845e-04
## [2,] -0.0001633431 0.0853841838 2.390790e-03 -1.462154e-04
## [3,] 0.0005939143 0.0023907904 8.114219e-02 -4.714618e-06
## [4,] -0.0003672845 -0.0001462154 -4.714618e-06 8.467722e-02
Again, we want to make sure this is reproducible
# Same sequence with same seed
clusterSetRNGStream(cl, 123)
ans1 <- var(do.call(cbind, parLapply(cl, 1:nnodes, function(x) runif(nsims))))
ans0 - ans1 # A matrix of zeros
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
# 4. STOP THE CLUSTER
stopCluster(cl)
mclapply
(Forking on the fly)In the case of mclapply
, the forking (cluster creation) is done on the fly!
# 1. CREATING A CLUSTER
library(parallel)
# The fork cluster will copy the -nsims- object
nsims <- 1e3
nnodes <- 4L
# cl <- makeForkCluster(nnodes) # mclapply does it on the fly
# 2. PREPARING THE CLUSTER
set.seed(123)
# 3. DO YOUR CALL
ans <- do.call(cbind, mclapply(1:nnodes, function(x) runif(nsims)))
(ans0 <- var(ans))
## [,1] [,2] [,3] [,4]
## [1,] 0.085384184 0.002390790 0.006576204 -0.003998278
## [2,] 0.002390790 0.081142190 0.001846963 0.001476244
## [3,] 0.006576204 0.001846963 0.085175347 -0.002807348
## [4,] -0.003998278 0.001476244 -0.002807348 0.082425477
Once more, we want to make sure this is reproducible
# Same sequence with same seed
set.seed(123)
ans1 <- var(do.call(cbind, mclapply(1:nnodes, function(x) runif(nsims))))
ans0 - ans1 # A matrix of zeros
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
# 4. STOP THE CLUSTER
# stopCluster(cl) no need of doing this anymore
RcppArmadillo + OpenMP
=