A Brief Introduction to
Using R for High-Performance Computing
trojan R logo

George G. Vega Yon


University of Southern California
Department of Preventive Medicine

August 27th, 2019

High-Performance Computing: An overview

Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:

  1. Big data: How to work with data that doesn’t fit your computer

  2. Parallel computing: How to take advantage of multiple core systems

  3. Compiled code: Write your own low-level code (if R doesn’t has it yet…)

(Checkout CRAN Task View on HPC)

Some vocabulary for HPC

In raw terms

You may not have access to a supercomputer, but certainly HPC/HTC clusters are more accesible these days, e.g. AWS provides a service to create HPC clusters at a low cost (allegedly, since nobody understands how pricing works)

What’s “a core”?

Taxonomy of CPUs (Downloaded from de https://slurm.schedmd.com/mc_support.html)

Taxonomy of CPUs (Downloaded from de https://slurm.schedmd.com/mc_support.html)

Now, how many cores does your computer has, the parallel package can tell you that:

parallel::detectCores()
## [1] 4

What is parallel computing, anyway?

f <- function(n) n*2
f(1:4)
Here we are using a single core. The function is applied one element at a time, leaving the other 3 cores without usage.

Here we are using a single core. The function is applied one element at a time, leaving the other 3 cores without usage.

What is parallel computing, anyway? (cont’d)

f <- function(n) n*2
f(1:4)
In this more intelligent way of computation, we are taking full advantage of our computer by using all 4 cores at the same time. This will translate in a reduced computation time which, in the case of complicated/long calculations, can be an important speed gain.

In this more intelligent way of computation, we are taking full advantage of our computer by using all 4 cores at the same time. This will translate in a reduced computation time which, in the case of complicated/long calculations, can be an important speed gain.

Let’s think before we start…