Search code examples
ropencpu

Opencpu data caching


My speciality is java. I use R for very specific analyses.

PROBLEM

  1. My understanding is that each API call to opencpu opens a new R session.

  2. My function will classify the data input using the predict method of a linear discriminant analysis (lda from MASS package).

  3. The initial linear discriminant analysis on 100000+ cases and 150+ factor levels takes time (over 30 seconds). This function returns a list.

  4. The subsequent prediction function is quick and returns a simple vector.

APPROACH

  1. I run one opencpu function to run the initial lda. This only needs to run once.

  2. I want my second function to ONLY run the predict function. This is possible if the lda is held as a global variable.

  3. My understanding is that global variables are not possible in opencpu. So I will have to cache the lda on the file system.

  4. In sum, I need to run the lda just once and hold the analysis (a list) either in memory or on the file system. I then retrieve the lda analysis when predict is called.

QUESTION

Which approach is best, and how to implement?

  1. I could use an opencpu function that creates and returns the lda. Then when I call a prediction, I could retrieve the lda object (a list) from the file system. But how do I retrieve the list from the file system. How does opencpu even know where it is?

  2. I could use r.cache package. I haven't used this package before but the docs suggest it is a solution. Will this work?

Any advice would be deeply appreciated.

best jake


Solution

  • The package r.cache solves the problem very easily