Search code examples
rattrpurrrreshape2tidyselect

How to copy attributes from one data frame to another or to re-assign attributes to a freshly transposed data frame - R


After transposing data I'd like to re-assign attributes that are dropped. This could also be applicable to copying attributes from one data frame to another. Or copying attributes after mutates, etc., where they are dropped.

 library(reshape2)

 df <- data.frame(id = c(1,2,3,4,5), 
                  time = c(11, 22,33,44,55),
                  c  = c(1,2,3,5,5),
                  d = c(4,2,5,4,NA))

attr(df$id,"label")<- "label"
attr(df$time,"label")<- "label2"
attr(df$c,"label")<- "something here"
attr(df$d,"label")<- "count of something"
str(df)

 str(df)
 data.frame':   5 obs. of  4 variables:
 $ id  : num  1 2 3 4 5
  ..- attr(*, "label")= chr "label"
 $ time: num  11 22 33 44 55
  ..- attr(*, "label")= chr "label2"
 $ c   : num  1 2 3 5 5
  ..- attr(*, "label")= chr "something here"
 $ d   : num  4 2 5 4 NA
  ..- attr(*, "label")= chr "count of something"

Cast to wide

dfwide<- recast(df,id~variable +time, 
            id.var = c("id","time"))

Usual attribute lost message:

   Warning message:
     attributes are not identical across measure variables; they will be dropped 

 str(dfwide)
'data.frame':   5 obs. of  11 variables:
 $ id  : num  1 2 3 4 5
 $ c_11: num  1 NA NA NA NA
 $ c_22: num  NA 2 NA NA NA
 $ c_33: num  NA NA 3 NA NA
 $ c_44: num  NA NA NA 5 NA
 $ c_55: num  NA NA NA NA 5
 $ d_11: num  4 NA NA NA NA
 $ d_22: num  NA 2 NA NA NA
 $ d_33: num  NA NA 5 NA NA
 $ d_44: num  NA NA NA 4 NA
 $ d_55: num  NA NA NA NA NA

Using mostattributes one can copy attributes between dataframes, but for iterations over many column names I can't figure out or think about how to map this efficiently in a different way save one by one.

 mostattributes(dfwide$c_11)<-attributes(df$c)
 mostattributes(dfwide$c_22)<-attributes(df$c)
 > str(dfwide)
 'data.frame':  5 obs. of  11 variables:
  $ id  : num  1 2 3 4 5
  $ c_11: num  1 NA NA NA NA
  ..- attr(*, "label")= chr "something here"
  $ c_22: num  NA 2 NA NA NA
  ..- attr(*, "label")= chr "something here"
  $ c_33: num  NA NA 3 NA NA

I was trying to automate it but failed (all c's should have same labels and d's have same labels):

#extract arguments
dlist<-enframe(names(df))%>%
   slice(-1,-2)%>%
   pull(., value)
 dlist

 dlistw<-enframe(names(dfwide))%>%
  slice(-1)%>%
  pull(., value)
 dlistw

#function
mostatt<- function(var1, var2) {
  mostattributes(dfwide[[var1]])<<-attributes(df[[var2]])
}

mapply(mostatt,dlistw,dlist)
str(dfwide)

'data.frame':   5 obs. of  11 variables:
 $ id  : num  1 2 3 4 5
 $ c_11: num  1 NA NA NA NA
  ..- attr(*, "label")= chr "something here"
 $ c_22: num  NA 2 NA NA NA
  ..- attr(*, "label")= chr "count of something"
 $ c_33: num  NA NA 3 NA NA
  ..- attr(*, "label")= chr "something here"
 $ c_44: num  NA NA NA 5 NA
  ..- attr(*, "label")= chr "count of something"
 $ c_55: num  NA NA NA NA 5
  ..- attr(*, "label")= chr "something here"
 $ d_11: num  4 NA NA NA NA
  ..- attr(*, "label")= chr "count of something"
 $ d_22: num  NA 2 NA NA NA
  ..- attr(*, "label")= chr "something here"
 $ d_33: num  NA NA 5 NA NA
  ..- attr(*, "label")= chr "count of something"
 $ d_44: num  NA NA NA 4 NA
  ..- attr(*, "label")= chr "something here"
 $ d_55: num  NA NA NA NA NA
  ..- attr(*, "label")= chr "count of something"

I think using tidyselect starts_with might be worth a try but not sure how to incorporate it. Any suggestions would be appreciated. Thank you!


Solution

  • This is an option:

    for(i in (setdiff(colnames(df), "id"))){
      for(x in colnames(dfwide)[(grepl(i, colnames(dfwide)))])
          mostattributes(dfwide[[x]]) <- attributes(df[[i]])
    }
    mostattributes(dfwide$id) <- attributes(df$id) 
    

    Because d is contained in id I need to rewrite id at the end. If you change d for e is even simplier:

    df <- data.frame(id = c(1,2,3,4,5), 
                     time = c(11, 22,33,44,55),
                     c  = c(1,2,3,5,5),
                     e = c(4,2,5,4,NA))
    
    
    attr(df$id,"label")<- "label"
    attr(df$time,"label")<- "label2"
    attr(df$c,"label")<- "something here"
    attr(df$e,"label")<- "count of something"
    str(df)
    
    dfwide<- recast(df,id~variable +time, 
                    id.var = c("id","time"))
    
    for(i in (colnames(df))){
      for(x in colnames(dfwide)[(grepl(i, colnames(dfwide)))])
        mostattributes(dfwide[[x]]) <- attributes(df[[i]])
    }