I'm working with data that has certain labels (like is common in social science research). In particular, here the values in a column are not stored as a factor but as a numeric column that contains labels (as attributes). And these labels in turn have attributes (the names of the label).
Now, I know how I can change the names of the labels for existing columns. But I fail to do so for new columns or to me more precise: I know how I can create them with dedicated packages, but I'm wondering if there's a native/base R option with e.g. attr
, attributes
or structure
.
example data:
df <- structure(list(Q16_3 = structure(c(NA, NA, 1, 1, 1, NA, NA, 1, 0, 1),
label = "Q16_3 question label",
format.spss = "F8.2",
labels = c(`Not Selected` = 0, Selected = 1),
class = c("haven_labelled", "vctrs_vctr", "double")),
Q16_4 = structure(c(NA, NA, 1, 1, 1, NA, NA, 0, 0, 1),
label = "Q16_4 question label",
format.spss = "F8.2",
labels = c(`Not Selected` = 0, Selected = 1),
class = c("haven_labelled", "vctrs_vctr", "double"))),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
E.g. df %>% count(Q16_4)
gives:
# A tibble: 3 x 2
Q16_4 n
* <dbl+lbl> <int>
1 0 [Not Selected] 2
2 1 [Selected] 4
3 NA 4
Now I'm creating a column and trying to create a "labels" attribute, but it fails to show up:
df <- df %>%
mutate(test = rep(1:2, 5))
df$test <- structure(df$test, labels = c("NO" = 1, "YES" = 2))
df %>%
count(test)
only gives:
# A tibble: 2 x 2
test n
* <int> <int>
1 1 5
2 2 5
I guess it has sth. to do with the structure of the attributes itself, because they look different:
str(df)
tibble [10 x 3] (S3: tbl_df/tbl/data.frame)
$ Q16_3: dbl+lbl [1:10] NA, NA, 1, 1, 1, NA, NA, 1, 0, 1
..@ label : chr "Q16_3 question label"
..@ format.spss: chr "F8.2"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "1 Selected" "2 Selected"
$ Q16_4: dbl+lbl [1:10] NA, NA, 1, 1, 1, NA, NA, 0, 0, 1
..@ label : chr "Q16_4 question label"
..@ format.spss: chr "F8.2"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "NO NO" "YES YES"
$ test : int [1:10] 1 2 1 2 1 2 1 2 1 2
..- attr(*, "labels")= Named num [1:2] 1 2
.. ..- attr(*, "names")= chr [1:2] "NO" "YES"
Long story short: how would I need to change my code to allow for creating such "nested" attributes?
You can extract the labels using attr
and use match
to replace them.
var <- attr(df$test, 'labels')
df$test_label <- names(var)[match(df$test, var)]
df
# Q16_3 Q16_4 test test_label
# <dbl+lbl> <dbl+lbl> <int> <chr>
# 1 NA NA 1 NO
# 2 NA NA 2 YES
# 3 1 [Selected] 1 [Selected] 1 NO
# 4 1 [Selected] 1 [Selected] 2 YES
# 5 1 [Selected] 1 [Selected] 1 NO
# 6 NA NA 2 YES
# 7 NA NA 1 NO
# 8 1 [Selected] 0 [Not Selected] 2 YES
# 9 0 [Not Selected] 0 [Not Selected] 1 NO
#10 1 [Selected] 1 [Selected] 2 YES
If you want to replace the original test
column assign it to df$test <-
above.
In your original dataframe what you have is haven labelled data which can be constructed in this way :
library(dplyr)
df %>%
mutate(test = haven::labelled(rep(1:2, 5), labels = c("NO" = 1, "YES" = 2)))
# Q16_3 Q16_4 test
# <dbl+lbl> <dbl+lbl> <int+lbl>
# 1 NA NA 1 [NO]
# 2 NA NA 2 [YES]
# 3 1 [Selected] 1 [Selected] 1 [NO]
# 4 1 [Selected] 1 [Selected] 2 [YES]
# 5 1 [Selected] 1 [Selected] 1 [NO]
# 6 NA NA 2 [YES]
# 7 NA NA 1 [NO]
# 8 1 [Selected] 0 [Not Selected] 2 [YES]
# 9 0 [Not Selected] 0 [Not Selected] 1 [NO]
#10 1 [Selected] 1 [Selected] 2 [YES]
It's labels will also show up in count
:
df %>%
mutate(test = haven::labelled(rep(1:2, 5),labels = c("NO" = 1, "YES" = 2))) %>%
count(test)
# test n
#* <int+lbl> <int>
#1 1 [NO] 5
#2 2 [YES] 5