Search code examples
rlevels

How to force specific levels when dataframe column does not contain that level? (Using R)


I have columns in a dataset that could potentially either contain 0 or 1, but some of the columns just contain 0.

I want to use these numbers as factors but I still want every column to have the levels 0 and 1. I tried the code below but I keep getting an error but I cant understand why...

#dataframe df has 100 rows

column_list = c("col1", "col2",  "col3")  

for (col in column_list) {
      #convert number 0 and number 1 to factors
      # (but sometimes the data only has zeros)
      df[,col] <- as.factor(df[,col])

      # I want to force levels to be 0 and 1
      # this is for when the data is completely missing number 1

      levels(df[, col] <- c(0,1))          #give error

      # Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) : 
      # replacement has 2 rows, data has 100


      print(levels(df[, col]))
      #this produces "0" "1" or just "0" depending on the column

}

Solution

  • I think you have just put a ) in the wrong place

    This works:

    column_list = c("col1", "col2",  "col3")  
    df <- data.frame(matrix(0, nrow = 100, ncol = 3))
    names(df) <- column_list
    
    for (col in column_list) {
      #convert number 0 and number 1 to factors
      # (but sometimes the data only has zeros)
      df[,col] <- as.factor(df[,col])
    
      # I want to force levels to be 0 and 1
      # this is for when the data is completely missing number 1
    
      levels(df[, col]) <- c(0,1)          #no error anymore
    
      # Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) : 
      # replacement has 2 rows, data has 100
    
    
      print(levels(df[, col]))
      #this produces "0" "1" or just "0" depending on the column
    
    }