Search code examples
rloopsfor-loop

R: looping over variable names


I have a dataframe that is an ID variable and a bunch of date variables. There's a lot of missing data, and I want to convert any date value to a 1, and leave the missing values as-is. I don't care what each date actually is; we're using them more like "is there a date or not."

Example df and failed attempts:

df <- data.frame(
  id = c("a", "b", "c", "d", "e"),
  var_abc = as.Date(c("2020-05-06", NA, "2022-06-03", NA, NA), format = "%Y-%m-%d"),
  var_def = as.Date(c(NA, "2023-07-03", "2023-07-08", NA, "2022-04-06"), format = "%Y-%m-%d"),
  var_ghi = as.Date(c(NA, NA, NA, "2024-05-05", NA), format = "%Y-%m-%d"),
  stringsAsFactors = FALSE
)

var_names <- names(df[ , 2:4])
for (y in var_names) {
  df$y <- as.numeric(df$y)
  df$y[!is.na(df$y)] <- 1
}

df[, 2:7] <- as.numeric(orders_epic[ , 2:7])

For the loop, I get this error: Error in $<-.data.frame(tmp, "y", value = numeric(0)) : replacement has 0 rows, data has 5. Google has told me that the length is a issue, but length(var_names) returns 3. I also tried for (y in names(df[ , 2:4])) { etc., but got the same error.

For the subset, I get this error: Error: 'list' object cannot be coerced to type 'double'. Google has told me I need to change the list to a vector, but that seems like a bad idea given that it's my dataframe.

This link tells me looping over names is a bad idea, but my variable names don't follow a numeric sequence like the answers.

I thought about the apply() variables, but I think they are restricted to a set list of options, like mean. And I think grep() has to search for a pattern, but my variable names don't follow one.


Solution

  • You can use the double bracket notation in your loop:

    for (y in var_names) {
      df[[y]] <- as.numeric(df[[y]])
      df[[y]][!is.na(df[[y]])] <- 1
    }
    

    Right now R is looking for a column literally named 'y', which doesn't exist. That's causing the errors.