How do I perform the following task for a spark data frame. In dplyr, I would do this:
library(dplyr)
df1 <- data.frame(x = 1:10, y = 101:110)
df2 <- data.frame(r = 5:10, s = 205:210)
df3 <- df1 %>% filter(x %in% df2$r)
How do I perform the filter(x %in% df2$r) command for a sparkR dataframe?
I just had similar question and this seemed to work for filtering from a list:
df3 <- filter(df1, ("x in ('string1','string2','string3')"))
in your case, you might want to consider a join
df3 <- drop(join(df1, SparkR::distinct(SparkR::select(df2,'r')), df1$x==df2$r),'r')
(probably a bit too expensive though) ..
cheers, anna