I have a dataframe that is an ID variable and a bunch of date variables. There's a lot of missing data, and I want to convert any date value to a 1, and leave the missing values as-is. I don't care what each date actually is; we're using them more like "is there a date or not."
Example df and failed attempts:
df <- data.frame(
id = c("a", "b", "c", "d", "e"),
var_abc = as.Date(c("2020-05-06", NA, "2022-06-03", NA, NA), format = "%Y-%m-%d"),
var_def = as.Date(c(NA, "2023-07-03", "2023-07-08", NA, "2022-04-06"), format = "%Y-%m-%d"),
var_ghi = as.Date(c(NA, NA, NA, "2024-05-05", NA), format = "%Y-%m-%d"),
stringsAsFactors = FALSE
)
var_names <- names(df[ , 2:4])
for (y in var_names) {
df$y <- as.numeric(df$y)
df$y[!is.na(df$y)] <- 1
}
df[, 2:7] <- as.numeric(orders_epic[ , 2:7])
For the loop, I get this error: Error in
$<-.data.frame(
tmp, "y", value = numeric(0)) : replacement has 0 rows, data has 5
. Google has told me that the length is a issue, but length(var_names)
returns 3. I also tried for (y in names(df[ , 2:4])) { etc.
, but got the same error.
For the subset, I get this error: Error: 'list' object cannot be coerced to type 'double'
. Google has told me I need to change the list to a vector, but that seems like a bad idea given that it's my dataframe.
This link tells me looping over names is a bad idea, but my variable names don't follow a numeric sequence like the answers.
I thought about the apply()
variables, but I think they are restricted to a set list of options, like mean. And I think grep()
has to search for a pattern, but my variable names don't follow one.
You can use the double bracket notation in your loop:
for (y in var_names) {
df[[y]] <- as.numeric(df[[y]])
df[[y]][!is.na(df[[y]])] <- 1
}
Right now R is looking for a column literally named 'y', which doesn't exist. That's causing the errors.