Search code examples
rlistdataframeapplynames

R list: add names attribute to each variable element


I have the following data.frame:

> mydf=data.frame(ID=LETTERS, var1=rep(c('a','b'),each=13), var2=c(rep('x',10),rep('y',12),rep('z',4)))
> mydf
   ID var1 var2
1   A    a    x
2   B    a    x
3   C    a    x
4   D    a    x
5   E    a    x
...

I want to make a list with the levels of each variable.

Each element in the list should be associated with a names attribute.

The names should be identical to the original element. Then I would want the values changed to variable name + original element.

Let me show you what I mean.

I first turn the data.frame into the list output I want:

> mylist=lapply(mydf, unique)
> mylist
$ID
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

$var1
[1] "a" "b"

$var2
[1] "x" "y" "z"

Now, I want to add a names attribute to the elements, so that names are equal to the original elements, and the new elements are the variable name plus the original elements.

I focus on var1:

> var1_names=mylist$var1
> var1_values=paste0('var1:',mylist$var1)
> mylist$var1=var1_values
> names(mylist$var1)=var1_names
> mylist
...
$var1
       a        b
"var1:a" "var1:b"
...

See how var1 has changed from:

$var1
[1] "a" "b"

to

$var1
       a        b
"var1:a" "var1:b"

Note the names attribute and how the new values have changed to include the variable name.

Now I would like to do the same thing for each variable in the list.

Is it possible to do it in a simple way with an apply approach and preferably base functions? Thanks!

EDIT: The final complete output would look like this (note the names attribute in each variable):

> mylist
$ID
     A      B      C      D      E      F      G      H      I      J
"ID:A" "ID:B" "ID:C" "ID:D" "ID:E" "ID:F" "ID:G" "ID:H" "ID:I" "ID:J"
     K      L      M      N      O      P      Q      R      S      T
"ID:K" "ID:L" "ID:M" "ID:N" "ID:O" "ID:P" "ID:Q" "ID:R" "ID:S" "ID:T"
     U      V      W      X      Y      Z
"ID:U" "ID:V" "ID:W" "ID:X" "ID:Y" "ID:Z"

$var1
       a        b
"var1:a" "var1:b"

$var2
       x        y        z
"var2:x" "var2:y" "var2:z"

Solution

  • Is this what you are after?

    lapply(names(mydf), \(x) paste(x, unique(mydf[[x]]), sep = ":"))
    
    [[1]]
     [1] "ID:A" "ID:B" "ID:C" "ID:D" "ID:E" "ID:F" "ID:G" "ID:H" "ID:I" "ID:J" "ID:K" "ID:L" "ID:M" "ID:N" "ID:O" "ID:P" "ID:Q" "ID:R" "ID:S" "ID:T" "ID:U" "ID:V" "ID:W" "ID:X" "ID:Y" "ID:Z"
    
    [[2]]
    [1] "var1:a" "var1:b"
    
    [[3]]
    [1] "var2:x" "var2:y" "var2:z"
    

    To add a name attribute you can use setNames():

    lapply(
      names(mydf), 
      \(x) {
        elm = unique(mydf[[x]])
        setNames(paste(x, elm, sep = ":"), elm)
      }
    )