Search code examples
rapache-sparksparkr

How do I convert a integer column in a SparkR data frame to a string?


I have a SparkR dataframe where all columns are integers. I want to replace one column with strings.

So, if the column contains 0, 1, 1, 0, I want to make that "no", "yes", "yes", "no".

I tried

df$C0 <- ifelse(df$C0 == 0, "no", "yes)

but that just gives me

 Error in as.logical(from) : 
   cannot coerce type 'S4' to vector of type 'logical'

How would I go about making this update?

P.S. I based the above attempt on the fact that this works:

df$C0 <- df$C0 + 1

Solution

  • Probably the simplest solution here is to use SQL:

    # Because it is hard to live without pipes
    library(magrittr)
    
    # Create sqlContext
    sqlContext <- sparkRSQL.init(sc)
    sqlContext <- SQLContext(sc)
    
    # Register table
    registerTempTable(df, 'df')
    
    # Query
    sql(sqlContext, "SELECT *, IF(C0 = 0, 'yes', 'no') AS C0 FROM df") %>% showDF()
    

    Unfortunately it creates a duplicate name so it probably to rename existing one first:

    df <- df %>% withColumnRenamed(existingCol = 'C0', newCol = 'CO_old')
    registerTempTable(df, 'df')
    sql(sqlContext, "SELECT *, IF(C0_old = 0, 'yes', 'no') AS C0 FROM df")
    

    or simply replace * with a list of columns you need.

    It is also possible to use when / otherwise:

    df %>% select(when(df$C) == 0, 'yes') %>% otherwise('no'))