Search code examples
rrenamedata-manipulation

R function to select and append variables from different data frames


Imagine I have several data frames from the same cross-national survey project. They ask the same questions over time but they entered the variables with different column names in each data frame. A variable (X) has column name v2 in one wave, V34 in another, etc.

df_w1 <- data.frame(v2 = c(3,4,6))
df_w2 <- data.frame(V34 = c(2,7,9))

I would like to select these variables, and append them in a new data frame. I can individually rename columns in each data frame, select the variables, and then append. But I would like to ask if there is a way to write a function, say, with a list of column names for a specific variable, which could go over the data frames, and then select, rename, and append values for that variable from different data frames.

For instance, the function would iterate through the list(df_w1, df_w2), check the variable names, identify v2 and V34, rename them (X), and append them in a new data frame. And I need to run this function for different variables that have different column names across data frames.

Edit: Here is how the final data could look like:

wave X
w1   3
w1   4  
w1   6
w2   2
w2   7
w2   9

Solution

  • If you keep a list same_cols of the columns that are equivalent across data frames, you can iterate over that list, and over the data frames, using map and rename_with to get the names aligned.

    library(tidyverse)
    
    n <- 5
    df_w1 <- data.frame(v1 = rnorm(n),
                        v2 = rnorm(n),
                        v3 = rnorm(n))
    df_w2 <- data.frame(V23 = rnorm(n),
                        V34 = rnorm(n),
                        V99 = rnorm(n))
    
    
    same_cols <- list(x1 = c("v2", "V34"),
                      x2 = c("v3", "V99"))
    
    same_cols |> 
      imap(\(col_set, new_colname) {
        list(df_w1, df_w2) |> 
          set_names(c('w1', 'w2')) |> 
          imap(\(df, wave) df |> 
                 select(any_of(col_set)) |> 
                 rename_with(\(col) new_colname) |> 
                 mutate(wave_idx = wave)) |> 
          list_rbind()
      }) 
    

    Output

    $x1
               x1 wave_idx
    1   0.4387218       w1
    2  -0.6791123       w1
    3  -1.7029648       w1
    4   1.2651684       w1
    5   0.3603572       w1
    6  -0.7365782       w2
    7   0.6224098       w2
    8  -0.2907160       w2
    9  -0.2142115       w2
    10 -0.1125596       w2
    
    $x2
               x2 wave_idx
    1  -0.5836394       w1
    2  -1.9940788       w1
    3   1.9022098       w1
    4   3.3903708       w1
    5   0.2074804       w1
    6  -1.8636670       w2
    7   0.8376299       w2
    8  -1.4434929       w2
    9  -0.2085702       w2
    10 -0.4385635       w2