hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Jonathan Dursi R&D computing at scale. See the original post here.

Beyond Single Core R- Parallel Data Analysis

I was asked recently to do short presentation for the Greater Toronto R Users Group on parallel computing in R; My slides can be seen below or on github, where the complete materials can be found.

I covered some similar things I had covered in a half-day workshop a couple of years earlier (though, obviously, without the hands-on component):

with some bonus material tacked on the end touching on a couple advanced topics.

I was quite surprised at how little had changed since late 2014, other than further development of SparkR (which I didn’t cover), and the interesting but seemingly not very much used future package. I was also struck by how hard it is to find similar materials online, covering a range of parallel computing topics in R - it’s rare enough that even this simple effort made it to the HPC project view on CRAN (under “related links”). R continues to grow in popularity for data analysis; is this all desktop computing? Is Spark siphoning off the clustered-dataframe usage?

(This was also my first time with RPres in RStudio; wow, not a fan, RPres was not ready for general release. And I’m a big fan of RMarkdown.)