Search code examples
rdataframesparkrmapply

mapply for SparkR R


I have a Spark dataframe "a" with a header like as follows

 C1 | C2 | C3 | C4  
 I1 | 12 | 31 | 4  
 I2 | 14 | 32 | 13  
 I3 | 13 | 33 | 15  
 I4 | 16 | 29 | 25  
 I5 | 18 | 30 | 73  
 I6 | 17 | 36 | 19  

Column 2 (C2) is always smaller than C3

I want to compare C4 with C3 and C2 with the following logic: if C4 is between C2 and C3 then return 1 else return 2

and add this as a new column to dataframe

I can do this with mapply when there is no Spark involved. But how can I do this in SparkR?


Solution

  • You should be able to do that with just an ifelse statement

    df_a <- data.frame(C1 = c('I1', 'I2', 'I3', 'I4', 'I5', 'I6'),
                    C2 = c(12, 14, 13, 16, 18, 17),
                    C3 = c(31, 32, 33, 29, 30, 36),
                    C4 = c(4, 13, 15, 25, 73, 19))
    
    a <- as.DataFrame(df_a)
    a$C5 <- ifelse(a$C4 > a$C2 & a$C4 < a$C3, 1, 2)
    head(a)
      C1 C2 C3 C4 C5
    1 I1 12 31  4  2
    2 I2 14 32 13  2
    3 I3 13 33 15  1
    4 I4 16 29 25  1
    5 I5 18 30 73  2
    6 I6 17 36 19  1