Search code examples
rattributeslabelstructureattr

Create nested attribute for a new column in a data frame


I'm working with data that has certain labels (like is common in social science research). In particular, here the values in a column are not stored as a factor but as a numeric column that contains labels (as attributes). And these labels in turn have attributes (the names of the label).

Now, I know how I can change the names of the labels for existing columns. But I fail to do so for new columns or to me more precise: I know how I can create them with dedicated packages, but I'm wondering if there's a native/base R option with e.g. attr, attributes or structure.

example data:

df <- structure(list(Q16_3 = structure(c(NA, NA, 1, 1, 1, NA, NA, 1, 0, 1),
                                       label = "Q16_3 question label",
                                       format.spss = "F8.2",
                                       labels = c(`Not Selected` = 0, Selected = 1),
                                       class = c("haven_labelled", "vctrs_vctr", "double")),
                     Q16_4 = structure(c(NA, NA, 1, 1, 1, NA, NA, 0, 0, 1),
                                       label = "Q16_4 question label",
                                       format.spss = "F8.2",
                                       labels = c(`Not Selected` = 0, Selected = 1),
                                       class = c("haven_labelled", "vctrs_vctr", "double"))),
                row.names = c(NA, -10L),
                class = c("tbl_df", "tbl", "data.frame"))

E.g. df %>% count(Q16_4) gives:

# A tibble: 3 x 2
              Q16_4     n
*         <dbl+lbl> <int>
1  0 [Not Selected]     2
2  1 [Selected]         4
3 NA                    4

Now I'm creating a column and trying to create a "labels" attribute, but it fails to show up:

df <- df %>%
  mutate(test = rep(1:2, 5))

df$test <- structure(df$test, labels = c("NO" = 1, "YES" = 2))

df %>%
  count(test)

only gives:

# A tibble: 2 x 2
   test     n
* <int> <int>
1     1     5
2     2     5

I guess it has sth. to do with the structure of the attributes itself, because they look different:

str(df)

tibble [10 x 3] (S3: tbl_df/tbl/data.frame)
 $ Q16_3: dbl+lbl [1:10] NA, NA,  1,  1,  1, NA, NA,  1,  0,  1
   ..@ label      : chr "Q16_3 question label"
   ..@ format.spss: chr "F8.2"
   ..@ labels     : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "1 Selected" "2 Selected"
 $ Q16_4: dbl+lbl [1:10] NA, NA,  1,  1,  1, NA, NA,  0,  0,  1
   ..@ label      : chr "Q16_4 question label"
   ..@ format.spss: chr "F8.2"
   ..@ labels     : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "NO NO" "YES YES"
 $ test : int [1:10] 1 2 1 2 1 2 1 2 1 2
  ..- attr(*, "labels")= Named num [1:2] 1 2
  .. ..- attr(*, "names")= chr [1:2] "NO" "YES"

Long story short: how would I need to change my code to allow for creating such "nested" attributes?


Solution

  • You can extract the labels using attr and use match to replace them.

    var <- attr(df$test, 'labels')
    df$test_label <- names(var)[match(df$test, var)]
    df
    
    #               Q16_3             Q16_4  test test_label
    #           <dbl+lbl>         <dbl+lbl> <int> <chr>     
    # 1 NA                NA                    1 NO        
    # 2 NA                NA                    2 YES       
    # 3  1 [Selected]      1 [Selected]         1 NO        
    # 4  1 [Selected]      1 [Selected]         2 YES       
    # 5  1 [Selected]      1 [Selected]         1 NO        
    # 6 NA                NA                    2 YES       
    # 7 NA                NA                    1 NO        
    # 8  1 [Selected]      0 [Not Selected]     2 YES       
    # 9  0 [Not Selected]  0 [Not Selected]     1 NO        
    #10  1 [Selected]      1 [Selected]         2 YES       
    

    If you want to replace the original test column assign it to df$test <- above.


    In your original dataframe what you have is haven labelled data which can be constructed in this way :

    library(dplyr)
    
    df %>%
      mutate(test = haven::labelled(rep(1:2, 5), labels = c("NO" = 1, "YES" = 2)))
    
    #          Q16_3             Q16_4      test
    #           <dbl+lbl>         <dbl+lbl> <int+lbl>
    # 1 NA                NA                  1 [NO] 
    # 2 NA                NA                  2 [YES]
    # 3  1 [Selected]      1 [Selected]       1 [NO] 
    # 4  1 [Selected]      1 [Selected]       2 [YES]
    # 5  1 [Selected]      1 [Selected]       1 [NO] 
    # 6 NA                NA                  2 [YES]
    # 7 NA                NA                  1 [NO] 
    # 8  1 [Selected]      0 [Not Selected]   2 [YES]
    # 9  0 [Not Selected]  0 [Not Selected]   1 [NO] 
    #10  1 [Selected]      1 [Selected]       2 [YES]
    

    It's labels will also show up in count :

    df %>%
      mutate(test = haven::labelled(rep(1:2, 5),labels = c("NO" = 1, "YES" = 2))) %>%
      count(test)
    
    #      test     n
    #* <int+lbl> <int>
    #1   1 [NO]      5
    #2   2 [YES]     5