Search code examples
rdata-analysis

trouble making a decision on where to invest my time with big data analyses in R


I know R, I know SQL, I use Windows, I have a budget of $0, I have a terabyte of data, I have twelve processors, I have 96GB of RAM, I am motivated to learn new software if the speed gains will pay off in the long term.

I need to run descriptive statistics and regressions.

I have too many options. Where should I devote all of my energy? Thanks.


Solution

  • Well, that is a big topic.

    We did write a survey paper of the state of the art of parallel processing with R which you could start with. While it is now three years old, large parts of the discussion still hold.

    Otherwise, I would suggest starting small- to medium-size with something that actually matters to you and try to get that going faster. Over at the r-sig-HPC list (gmane link) list many folks are happy to help with specifics.