George G. Vega Yon
vegayon@usc.edu
University of Southern California
Department of Preventive Medicine
August 27th, 2019
Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:
Big data: How to work with data that doesn’t fit your computer
Parallel computing: How to take advantage of multiple core systems
Compiled code: Write your own low-level code (if R doesn’t has it yet…)
(Checkout CRAN Task View on HPC)
In raw terms
Supercomputer: A single big machine with thousands of cores/gpus.
High Performance Computing (HPC): Multiple machines within a single network.
High Throughput Computing (HTC): Multiple machines across multiple networks.
You may not have access to a supercomputer, but certainly HPC/HTC clusters are more accesible these days, e.g. AWS provides a service to create HPC clusters at a low cost (allegedly, since nobody understands how pricing works)
Taxonomy of CPUs (Downloaded from de https://slurm.schedmd.com/mc_support.html)
Now, how many cores does your computer has, the parallel package can tell you that:
## [1] 4
Here we are using a single core. The function is applied one element at a time, leaving the other 3 cores without usage.
In this more intelligent way of computation, we are taking full advantage of our computer by using all 4 cores at the same time. This will translate in a reduced computation time which, in the case of complicated/long calculations, can be an important speed gain.