Search code examples
rdataframecalculated-columns

create a new variable based on the name of the dataset in R


I have multiple datasets that contain the same names of variables but different values, i am trying to create a new variable called group that will distinguish every set of datasets (i am doing this because i will put all the datasets in one and i can distinguish by group). Here is a sample for 2 datasets.

################################
###       Sample data        ### 
################################

set.seed(8547)
a=sample(1:20,15,replace=FALSE)
a=sort(a)
f=runif(15,0,1)
f=sort(f)
trt1=data.frame(a,f)

set.seed(1452)
a=sample(1:35,22,replace=FALSE)
a=sort(a)
f=runif(22,0,1)
f=sort(f)
trt2=data.frame(a,f)

names_of_dataframes <- ls.str(mode = "list")

#  I used a `for` loop because i have approximatively `10` datasets and i do not know if the `apply` family would work for this kind of treatment

for (i in length(names_of_dataframes)) {
  if(names_of_dataframes[i]=="trt1"){
    trt1$group=rep("trt1",nrow(trt1))
  }else if (names_of_dataframes[i]=="trt2"){
    trt2$group=rep("trt2",nrow(trt2))
  }
      
}

I do not know what i am doing wrong but the group variable is only created for dataset trt2 and not trt1. Any thoughts what is wrong?

Thank you in advance for your help


Solution

  • We can load all of the datasets in to a list with mget and ls

    lst1 <- mget(ls(pattern = '^trt\\d+$'))
    lst1 <- Map(cbind, lst1, group = names(lst1))
    

    If needed to updated the original objects, use list2env (not recommended though)

    list2env(lst1, .GlobalEnv)
    

    -check the objects

    head(trt1)
    #  a          f group
    #1 1 0.03676253  trt1
    #2 2 0.07212860  trt1
    #3 3 0.10711856  trt1
    #4 4 0.14691670  trt1
    #5 5 0.33626002  trt1
    #6 6 0.41223646  trt1
    
    head(trt2)
    #  a          f group
    #1 2 0.01003053  trt2
    #2 3 0.05251810  trt2
    #3 4 0.08916620  trt2
    #4 5 0.17498162  trt2
    #5 6 0.24118046  trt2
    #6 8 0.24816209  trt2
    

    Or another option is assign

    nm1 <- ls(pattern = '^trt\\d+$')
    for(nm in nm1) {
         assign(nm, `[[<-`(get(nm), "group", value = nm))
    
    }
    

    Or using map/mutate

    library(dplyr)
    library(purrr)
    map(nm1, ~ get(.x) %>%
             mutate(group = .x))