I have a data frame called df
containing 490 cases and 3 variables (V1
, V2
, and V3
). That is 490 x 3.
Each observation is either 1
, 2
, or NA
(there are also some missing values).
With the commands expss::val_lab()
and expss::num_lab()
, value 1 has been labelled "Low" and value 2 has been labelled "High".
Here is a sample code for reproducibility:
# Set the seed for reproducibility
# Generate random data
n <- 490
V1 <- sample(c(1, 2, NA), n, replace = TRUE)
V2 <- sample(c(1, 2, NA), n, replace = TRUE)
V3 <- sample(c(1, 2, NA), n, replace = TRUE)
# Create the data frame
df <- data.frame(V1, V2, V3)
# Print the first few rows of the data frame
# Label the values: 1 = Low, 2 = High
expss::val_lab(df$V1) = expss::num_lab("1 Low
2 High")
expss::val_lab(df$V2) = expss::num_lab("1 Low
2 High")
expss::val_lab(df$V3) = expss::num_lab("1 Low
2 High")
The contingency table should look like this (generated with command ftable(df)
V3 Low High
V1 V2
Low Low 15 17
High 19 14
High Low 24 16
High 14 23
I would like to extract only the data that has 2
(or High
), but with all the combinations, that is, the table should look something like this:
Combination n
V1 24
V2 19
V3 17
V1, V2 14
V1, V2, V3 23
V2, V3 14
V3, V1 16
My questions:
Bonus question: Would it be possible to generate table and barplot (or any relevant graphical output) with Base R (i.e., without the use of any additional package)?
FWIW: sessioninfo::session_info()
setting value
version R version 4.2.3 (2023-03-15)
os macOS Ventura 13.3.1
system x86_64, darwin17.0
language (EN)
rstudio 2023.03.0+386 Cherry Blossom (desktop)
Here is a way to automatically generate combinations:
out <- df %>%
tidyr::pivot_longer(V1:V3) %>%
summarise(combination=toString(name),.by=id) %>%
# A tibble: 7 × 2
combination n
<chr> <int>
1 V2 72
2 V2, V3 28
3 V1 82
4 V3 76
5 V1, V3 29
6 V1, V2 33
7 V1, V2, V3 23
Simple barplot
ggplot(out,aes(combination,n)) +