hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Jonathan Dursi R&D computing at scale. See the original post here.

Scalable Data Analysis in R

R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it’s not clear what to do next.

I’ve put together material for a day-long tutorial on scalable data analysis in R. It covers:

The presentation for the material, in R markdown (so including the sourcecode) is in the presentation directory; you can read the resulting presentation as markdown there, or as a PDF.

The R code from the slides can be found in the R directory.

Some data can be found in the data directory; but as you might expect in a workshop on scalable data analysis, the files are quite large! Mostly you can just find scripts for downloading the data; running make in the main directory will pull almost everything down, but a little more work needs go to into automating some of the production of the data products used.

Suggestions, as always, greatly welcomed.