Search code examples
rloopsdplyrtidyversedata-management

How to make repetitive operations (loop) over a set of columns in a dataset in R


I have to make a series of operations over a subset of columns. I have a set of columns which measures the same thing to different parties A, B, and C:

id var1_A var1_B var1_C var2_A var2_B var2_C var3_A var3_B var3_C

So, in the example, var1_A var1_B var1_C refer to the same measurement for different parties. And var1_A, var2_A, var3_A refer to different variables for the same party A.

I would like to accomplish 2 things:

I need to create multiple data frames and merge the id with another dataframe, each one refers to one specific party. I wrote the code for each data frame individually, as the example below. The issue is that in the example it is simple. What complicates my life is that I have multiple datasets like df, and each of them contain information for multiple parties, and I end up with 50 lines of repetitive code. Is that a way to simplify?

df_A <- df %>% select(id var1_A var2_A var3_A)
df_A <- merge(df_A, df_merge, by="id")
df_B <- df %>% select(id var1_B var2_B var3_B)
df_B <- merge(df_B, df_merge, by="id")
df_C <- df %>% select(id var1_C var2_C var3_C)
df_C <- merge(df_C, df_merge, by="id")

The second thing I would like to accomplish is to change the variable name for df. I would like to change the variable name for all the columns that measure the same thing, but maintaining the party which it refers to. For example, say var1 refers to height, var2 refers to weight, and var3 refers to gender:

id var1_A var1_B var1_C var2_A var2_B var2_C var3_A var3_B var3_C

I would like to get something like:

id height_A height_B height_C weight_A weight_B weight_C gender_A gender_B gender_C

Is there a way to accomplish this with few lines of code? Or do I have to rename each of them individually (using rename command, for example)?


Solution

  • A tidy way:

    require(tidyverse)
    
    #CREATE DATA
    df <- data.frame(id = 1:10,
                     var1_A = runif(10),
                     var1_B = runif(10),
                     var1_C = runif(10),
                     var2_A = runif(10),
                     var2_B = runif(10),
                     var2_C = runif(10),
                     var3_A = runif(10),
                     var3_B = runif(10),
                     var3_C = runif(10))
    
    df_merge<-data.frame(id = 1:10,
                         value=11:20)
    #grabs current names
    nam<-colnames(df)
    
    #Create map of new names
    new_names = c('var1'='height','var2'='weight','var3'='gender')
    
    #replace the strings with new strings in map 
    nam <- str_replace_all(nam, new_names)
    
    #reassign column names to dataframe
    colnames(df)<-nam
    
    # loop through all letters in list assign to variable 
    #pasted with "df" and the letter, selects columns ending with 
    # letter, merges with df_ids and returns the new subset of data
    #to the assigned variable name
    
    
    for (letter in c('A', "B", "C")){
    
      assign(paste("df", letter, sep = '_'),
             df%>%select(id, ends_with(letter))%>%
               merge(df_merge, by='id'))
    }