Search code examples
rdata-analysis

Making new variables for every group of observation in R


I have 11 variables in my dataframe. The first is unique identifier of observation (a plane). The second one is a number from 1 to 21 representing flight of a given plane. The rest of the variables are time, velocity, distance, etc.

What I want to do is make new variables for every group (number) of flight e.g. time_1, time_2,..., velocity_1, velocity_2, etc. and consequently, reduce the number of observations (the repeating ones).

I don't really have idea how to start. I was thinking about a mutate function like:

mutate(df, time_1 = ifelse(n_flight == 1, time, NA))

But that would be a lot of typing and a new problem may appear, perhaps.


Solution

  • Basically, you want to convert long to wide data for each variable. You can lapply over these with tidyr::spread in that case. Suppose the data looks like the following:

    library(dplyr)
    library(tidyr)
    df <- data.frame(
      ID = c(rep("A", 3), rep("B", 3)), 
      n_flight = rep(seq(3), 2),
      time = seq(19, 24), 
      velocity = rev(seq(65, 60))
    )
    

    Then the following will generate your outcome of interest, as long as you get rid of the extra ID variables.

    lapply(
      setdiff(names(df), c("ID", "n_flight")), function(x) {
        df %>% 
          select(ID, n_flight, !!x) %>%
          tidyr::spread(., key = "n_flight", value = x) %>%
          setNames(paste(x, names(.), sep = "_"))
      }
    ) %>%
      bind_cols()
    

    Let me know if this wasn't what you were going for.