Create nested attribute for a new column in a data frame

I'm working with data that has certain labels (like is common in social science research). In particular, here the values in a column are not stored as a factor but as a numeric column that contains labels (as attributes). And these labels in turn have attributes (the names of the label).

Now, I know how I can change the names of the labels for existing columns. But I fail to do so for new columns or to me more precise: I know how I can create them with dedicated packages, but I'm wondering if there's a native/base R option with e.g. attr, attributes or structure.

example data:

df <- structure(list(Q16_3 = structure(c(NA, NA, 1, 1, 1, NA, NA, 1, 0, 1),
                                       label = "Q16_3 question label",
                                       format.spss = "F8.2",
                                       labels = c(`Not Selected` = 0, Selected = 1),
                                       class = c("haven_labelled", "vctrs_vctr", "double")),
                     Q16_4 = structure(c(NA, NA, 1, 1, 1, NA, NA, 0, 0, 1),
                                       label = "Q16_4 question label",
                                       format.spss = "F8.2",
                                       labels = c(`Not Selected` = 0, Selected = 1),
                                       class = c("haven_labelled", "vctrs_vctr", "double"))),
                row.names = c(NA, -10L),
                class = c("tbl_df", "tbl", "data.frame"))

E.g. df %>% count(Q16_4) gives:

# A tibble: 3 x 2
              Q16_4     n
*         <dbl+lbl> <int>
1  0 [Not Selected]     2
2  1 [Selected]         4
3 NA                    4

Now I'm creating a column and trying to create a "labels" attribute, but it fails to show up:

df <- df %>%
  mutate(test = rep(1:2, 5))

df$test <- structure(df$test, labels = c("NO" = 1, "YES" = 2))

df %>%
  count(test)

only gives:

# A tibble: 2 x 2
   test     n
* <int> <int>
1     1     5
2     2     5

I guess it has sth. to do with the structure of the attributes itself, because they look different:

str(df)

tibble [10 x 3] (S3: tbl_df/tbl/data.frame)
 $ Q16_3: dbl+lbl [1:10] NA, NA,  1,  1,  1, NA, NA,  1,  0,  1
   ..@ label      : chr "Q16_3 question label"
   ..@ format.spss: chr "F8.2"
   ..@ labels     : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "1 Selected" "2 Selected"
 $ Q16_4: dbl+lbl [1:10] NA, NA,  1,  1,  1, NA, NA,  0,  0,  1
   ..@ label      : chr "Q16_4 question label"
   ..@ format.spss: chr "F8.2"
   ..@ labels     : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "NO NO" "YES YES"
 $ test : int [1:10] 1 2 1 2 1 2 1 2 1 2
  ..- attr(*, "labels")= Named num [1:2] 1 2
  .. ..- attr(*, "names")= chr [1:2] "NO" "YES"

Long story short: how would I need to change my code to allow for creating such "nested" attributes?

Solution

You can extract the labels using attr and use match to replace them.

var <- attr(df$test, 'labels')
df$test_label <- names(var)[match(df$test, var)]
df

#               Q16_3             Q16_4  test test_label
#           <dbl+lbl>         <dbl+lbl> <int> <chr>     
# 1 NA                NA                    1 NO        
# 2 NA                NA                    2 YES       
# 3  1 [Selected]      1 [Selected]         1 NO        
# 4  1 [Selected]      1 [Selected]         2 YES       
# 5  1 [Selected]      1 [Selected]         1 NO        
# 6 NA                NA                    2 YES       
# 7 NA                NA                    1 NO        
# 8  1 [Selected]      0 [Not Selected]     2 YES       
# 9  0 [Not Selected]  0 [Not Selected]     1 NO        
#10  1 [Selected]      1 [Selected]         2 YES

If you want to replace the original test column assign it to df$test <- above.

In your original dataframe what you have is haven labelled data which can be constructed in this way :

library(dplyr)

df %>%
  mutate(test = haven::labelled(rep(1:2, 5), labels = c("NO" = 1, "YES" = 2)))

#          Q16_3             Q16_4      test
#           <dbl+lbl>         <dbl+lbl> <int+lbl>
# 1 NA                NA                  1 [NO] 
# 2 NA                NA                  2 [YES]
# 3  1 [Selected]      1 [Selected]       1 [NO] 
# 4  1 [Selected]      1 [Selected]       2 [YES]
# 5  1 [Selected]      1 [Selected]       1 [NO] 
# 6 NA                NA                  2 [YES]
# 7 NA                NA                  1 [NO] 
# 8  1 [Selected]      0 [Not Selected]   2 [YES]
# 9  0 [Not Selected]  0 [Not Selected]   1 [NO] 
#10  1 [Selected]      1 [Selected]       2 [YES]

It's labels will also show up in count :

df %>%
  mutate(test = haven::labelled(rep(1:2, 5),labels = c("NO" = 1, "YES" = 2))) %>%
  count(test)

#      test     n
#* <int+lbl> <int>
#1   1 [NO]      5
#2   2 [YES]     5