Search code examples
rdplyrstartswith

In a for loop, how do I insert the variable i inside the "starts_with" quotation?


I have this big dataframe, with species in rows and samples in columns. There are 30 samples, with 12 replicates each. The column names are written as such : sample.S1.01; sample.S1.02.....sample.S30.11; sample.S30.12.

I would like to create 30 new tables containing the 12 replicates for each samples.

I have this command line that works perfectly for one sample at a time :

dt<- tab_sp_sum %>%
    select(starts_with("sample.S1."))
assign(paste("tab_sp_1"), dt)

But when I put this in a for loop, it doesn't work anymore. I think it's due to the fact that the variable i is included in the starts_with quotation, and I don't know how to write it.

for (i in 1:30){
  dt<- tab_sp_sum %>%
    select(starts_with("sample.S",i,".", sep=""))
  assign(paste("tab_sp",i,sep="_"), dt)

although the last line works well, 30 tables are created with the right names, but they are empty.

Any suggestion ?

Thank you


Solution

  • Instead of using assign and store it in different objects try to use list . Create the names that you want to select using paste0 and then use map to create list of dataframes.

    library(dplyr)
    library(purrr)
    
    df_names <- paste0("sample.S", 1:30, ".")
    
    df1 <- map(df_names, ~tab_sp_sum %>% select(starts_with(.x)))
    

    You can then use df1[[1]], df1[[2]] to access individual dataframes.


    In base R, we can use lapply by creating a regex to select columns that starts with df_names

    df1 <- lapply(df_names, function(x) 
                 tab_sp_sum[grep(paste0("^", x), names(tab_sp_sum))])
    

    Using it with built-in iris dataset

    df_names <- c("Sepal", "Petal")
    df1 <- map(df_names, ~iris %>% select(starts_with(.x)))
    
    head(df1[[1]])
    #  Sepal.Length Sepal.Width
    #1          5.1         3.5
    #2          4.9         3.0
    #3          4.7         3.2
    #4          4.6         3.1
    #5          5.0         3.6
    #6          5.4         3.9
    
     head(df1[[2]])
    #  Petal.Length Petal.Width
    #1          1.4         0.2
    #2          1.4         0.2
    #3          1.3         0.2
    #4          1.5         0.2
    #5          1.4         0.2
    #6          1.7         0.4