Search code examples
rexpss

How to drop unused value labels in crosstabulations table outputs using cro function from expss package?


I'm using heaven labelled dataframes (variables already have value labels when importing datasets). I need to run many crosstabulations of two variables. I’m using the cro function from expss package because by default displays value labels, and computes weighted crosstabs.

However, the output tables I get display unused value labels. How can I drop unused labels without manually dropping unused value labels for each variable? (by the way: the fre function from expss package has this argument by default: drop_unused_labels = TRUE, but cro function doesn’t)

Here is a reproducible example:

# Dataframe 
df <- data.frame(sex = c(1, 2, 99, 2, 1, 2, 2, 2, 1, 2),
                 agegroup= c(1, 2, 99, 2, 3, 3, 2, 2, 2, 1),
                 weight = c(100, 20, 400, 300, 50, 50, 80, 250, 100, 100))
library(expss)

# Variable labels
var_lab(df$sex) <-"Sex"
var_lab(df$agegroup) <-"Age group"

# Value labels 
val_lab(df$sex) <- make_labels("1 Male 
                               2 Female
                               97 Didn't know
                               98 Didn't respond
                               99 Abandoned survey")

val_lab(df$agegroup) <- make_labels("1 1-29
                                        2 30-49
                                        3 50 and more
                                       97 Didn't know
                                       98 Didn't respond
                                       99 Abandoned survey")

cro(df$sex, df$agegroup, weight = df$weight)

 |     |                  | Age group |       |             |             |                |                  |
 |     |                  |      1-29 | 30-49 | 50 and more | Didn't know | Didn't respond | Abandoned survey |
 | --- | ---------------- | --------- | ----- | ----------- | ----------- | -------------- | ---------------- |
 | Sex |             Male |       100 |   100 |          50 |             |                |                  |
 |     |           Female |       100 |   650 |          50 |             |                |                  |
 |     |      Didn't know |           |       |             |             |                |                  |
 |     |   Didn't respond |           |       |             |             |                |                  |
 |     | Abandoned survey |           |       |             |             |                |              400 |
 |     |     #Total cases |         2 |     5 |           2 |             |                |                1 |

I want to get rid of the columns and rows called ‘Didn't know’ and ‘Didn't respond’.


Solution

  • You can use drop_unused_labels function to remove the labels which are not used.

    library(expss)
    df1 <- drop_unused_labels(df)
    cro(df1$sex, df1$agegroup, weight = df1$weight)
                                                                               
     |     |                  | Age group |       |             |                  |
     |     |                  |      1-29 | 30-49 | 50 and more | Abandoned survey |
     | --- | ---------------- | --------- | ----- | ----------- | ---------------- |
     | Sex |             Male |       100 |   100 |          50 |                  |
     |     |           Female |       100 |   650 |          50 |                  |
     |     | Abandoned survey |           |       |             |              400 |
     |     |     #Total cases |         2 |     5 |           2 |                1 |