Search code examples
rdataframesqldfcoercion

Overcome the coercion of dataframe to a list when using sqldf


I need to classify the rows into two groups(Present or Not present) based on the values in the two columns Var1, Var2 in the dataframe pum. Later add this as a new column to another dataframe as data.cov.cat$PU. When Var1 greater than 0.053 or Var2 in the set of values c(1,2,3), the rows needs to be marked as Present else Not present.

The below piece of code is developed on the conditions.

data.cov.cat$PU <- sqldf("SELECT CASE WHEN Var1 >= 0.053 THEN 'Present'  
                                   WHEN Var1 < 0.053 OR Var2 IN (1,2,3) THEN 'Present'
                                   WHEN Var1 < 0.053 OR Var2 = 0 THEN 'Not present'
                                   ELSE 'NA' 
                              END as newCol
                              FROM pum")$newCol

The command executes properly and the output is as expected except the dataframe data.cov.cat is coerced to a list.

Could you please suggest how to avoid the coersion of the dataframe to a list.

Reproducible example:

data.cov.cat <- data.frame(V1 = runif(10,0,10), V12 = c("blue", "red", "yellow", "pink","blue", "red", "yellow", "pink", "red", "yellow"))

data.cov.cat <- sapply(data.cov.cat, as.factor)

pum <- data.frame(Var1 = runif(10,0,1), Var2 = c(1,3,2,0,2,1,2,0,1,3))

data.cov.cat$PU <- sqldf("SELECT CASE WHEN Var1 >= 0.053 THEN 'Present'  
                                   WHEN Var1 < 0.053 OR Var2 IN (1,2,3) THEN 'Present'
                                   WHEN Var1 < 0.053 OR Var2 = 0 THEN 'Not present'
                                   ELSE 'NA' 
                              END as newCol
                              FROM pum")$newCol

As the original data.frame has columns factorized, I am converting them here as well.


Solution

  • data.cov.cat is not a dataframe before the last sqldf statement executes. It has been changed to a matrix when sapply got executed. The solution to the reproducible example is:

    data.cov.cat <- data.frame(V1 = runif(10,0,10), V12 = c("blue", "red", "yellow", "pink","blue", "red", "yellow", "pink", "red", "yellow"))
    
    data.cov.cat <- as.data.frame(sapply(data.cov.cat, as.factor))
    
    pum <- data.frame(Var1 = runif(10,0,1), Var2 = c(1,3,2,0,2,1,2,0,1,3))
    
    data.cov.cat$PU <- sqldf("SELECT CASE WHEN Var1 >= 0.053 THEN 'Present'  
                                       WHEN Var1 < 0.053 OR Var2 IN (1,2,3) THEN 'Present'
                                       WHEN Var1 < 0.053 OR Var2 = 0 THEN 'Not present'
                                       ELSE 'NA' 
                                  END as newCol
                                  FROM pum")$newCol
    

    Now, the coercion is not faced.