Search code examples
rapache-sparkdata.tablecluster-computingsparkr

Is it possible to use data.table on SparkR with Sparkdataframes?


Situation

I used to work on Rstudio with data.table instead of plyr or sqldf because it's really fast. Now, i'm working on sparkR on an azure cluster and i'd like to now if i can use data.table on my spark Data frames and if it's faster than sql ?


Solution

  • It is not possible. SparkDataFrames are Java objects with a thin R interface. While it is possible to use worker side R in some limited cases (dapply, gapply) there is no use for data.table there.