In sparkR I have data
as a DataFrame.
I can attach one entry in data
like this:
newdata <- filter(data, data$column == 1)
How can I attach more than just one?
Say I want to attach all elements in the vector list <- c(1,6,10,11,14)
or if list
is a DataFrame 1 6 10 11 14
.
newdata <- filter(data, data$column == list)
If I do it like this I get an error.
If you are ultimately trying to filter a spark DataFrame by a list of unique values, you can do this with a merge
operation. If you are talking about going from a long to a wide data format, you need to ensure there are the same number of observations for each 'level' of the factor variable you are considering. If you want to subset a Spark dataframe by columns, you could also use a select statement, or build up a select statement by pasting data$blah into and then do the eval(parse(text=bigTextObject))
as @Wannes suggested. Maybe a function that generates a big select
statement is what you want (if you are filtering by column name)...a merge
is what you want if you are trying to extract values from a single column.
From what I understand, it seems as if you want to take a big Spark DataFrame with lots of columns and only take the ones you are interested in, as indicated by list
in your question.
Here is a little function to generate the spark select
statement:
list<- c(1,2,5,8,90,200)
listWithDataPrePended<- paste0('data', '$', list)
gettingCloser<- noquote(paste0(listWithDataPrePended, collapse = ','))
finalSelectStatement<- noquote(paste("select(data,", gettingCloser, ")"))
finalData<- eval(parse(text=finalSelectStatement))
finalData<- SparkR::collect(finalData)
Maybe this is what you're looking for...maybe not. Nonetheless, I hope it's helpful.
Good luck, nate