I have a SparkR dataframe where all columns are integers. I want to replace one column with strings.
So, if the column contains 0, 1, 1, 0, I want to make that "no", "yes", "yes", "no".
I tried
df$C0 <- ifelse(df$C0 == 0, "no", "yes)
but that just gives me
Error in as.logical(from) :
cannot coerce type 'S4' to vector of type 'logical'
How would I go about making this update?
P.S. I based the above attempt on the fact that this works:
df$C0 <- df$C0 + 1
Probably the simplest solution here is to use SQL:
# Because it is hard to live without pipes
library(magrittr)
# Create sqlContext
sqlContext <- sparkRSQL.init(sc)
sqlContext <- SQLContext(sc)
# Register table
registerTempTable(df, 'df')
# Query
sql(sqlContext, "SELECT *, IF(C0 = 0, 'yes', 'no') AS C0 FROM df") %>% showDF()
Unfortunately it creates a duplicate name so it probably to rename existing one first:
df <- df %>% withColumnRenamed(existingCol = 'C0', newCol = 'CO_old')
registerTempTable(df, 'df')
sql(sqlContext, "SELECT *, IF(C0_old = 0, 'yes', 'no') AS C0 FROM df")
or simply replace *
with a list of columns you need.
It is also possible to use when
/ otherwise
:
df %>% select(when(df$C) == 0, 'yes') %>% otherwise('no'))