I have columns in a dataset that could potentially either contain 0 or 1, but some of the columns just contain 0.
I want to use these numbers as factors but I still want every column to have the levels 0 and 1. I tried the code below but I keep getting an error but I cant understand why...
#dataframe df has 100 rows
column_list = c("col1", "col2", "col3")
for (col in column_list) {
#convert number 0 and number 1 to factors
# (but sometimes the data only has zeros)
df[,col] <- as.factor(df[,col])
# I want to force levels to be 0 and 1
# this is for when the data is completely missing number 1
levels(df[, col] <- c(0,1)) #give error
# Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) :
# replacement has 2 rows, data has 100
print(levels(df[, col]))
#this produces "0" "1" or just "0" depending on the column
}
I think you have just put a )
in the wrong place
This works:
column_list = c("col1", "col2", "col3")
df <- data.frame(matrix(0, nrow = 100, ncol = 3))
names(df) <- column_list
for (col in column_list) {
#convert number 0 and number 1 to factors
# (but sometimes the data only has zeros)
df[,col] <- as.factor(df[,col])
# I want to force levels to be 0 and 1
# this is for when the data is completely missing number 1
levels(df[, col]) <- c(0,1) #no error anymore
# Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) :
# replacement has 2 rows, data has 100
print(levels(df[, col]))
#this produces "0" "1" or just "0" depending on the column
}