Search code examples
rdataframepivot-table

Create contingency table and appropriate plot with conditional data from data frame in R


I have a data frame called df containing 490 cases and 3 variables (V1, V2, and V3). That is 490 x 3.

Each observation is either 1, 2, or NA (there are also some missing values). With the commands expss::val_lab() and expss::num_lab(), value 1 has been labelled "Low" and value 2 has been labelled "High".

Here is a sample code for reproducibility:

# Set the seed for reproducibility
set.seed(123)

# Generate random data
n <- 490
V1 <- sample(c(1, 2, NA), n, replace = TRUE)
V2 <- sample(c(1, 2, NA), n, replace = TRUE)
V3 <- sample(c(1, 2, NA), n, replace = TRUE)

# Create the data frame
df <- data.frame(V1, V2, V3)

# Print the first few rows of the data frame
head(df)


# Label the values: 1 = Low, 2 = High
expss::val_lab(df$V1) = expss::num_lab("1 Low
                                        2 High")
expss::val_lab(df$V2) = expss::num_lab("1 Low
                                        2 High")
expss::val_lab(df$V3) = expss::num_lab("1 Low
                                        2 High")

The contingency table should look like this (generated with command ftable(df)):

          V3 Low High
V1   V2              
Low  Low      15   17
     High     19   14
High Low      24   16
     High     14   23

I would like to extract only the data that has 2 (or High), but with all the combinations, that is, the table should look something like this:

Combination   n
----------------
V1           24
V2           19
V3           17
V1, V2       14
V1, V2, V3   23
V2, V3       14
V3, V1       16

My questions:

  1. What would be the code to obtain the equivalent of the last table above?
  2. Would it be possible that the V1, V2, V3 and combinations thereof are generated automatically from the data (that is, not hardcoded so as to be able to easily adapt the code)?
  3. What would be the code to have a corresponding barplot with the appropriate labelling?

Bonus question: Would it be possible to generate table and barplot (or any relevant graphical output) with Base R (i.e., without the use of any additional package)?


FWIW: sessioninfo::session_info() extract:

 setting  value
-------------------------------------------------
 version  R version 4.2.3 (2023-03-15)
 os       macOS Ventura 13.3.1
 system   x86_64, darwin17.0
 language (EN)
 rstudio  2023.03.0+386 Cherry Blossom (desktop)


Solution

  • Here is a way to automatically generate combinations:

    out <- df %>%
      mutate(id=row_number())%>%
      tidyr::pivot_longer(V1:V3) %>%
      filter(value==2)%>%
      summarise(combination=toString(name),.by=id) %>%
      summarise(n=n(),.by=combination)
    
    out
    # A tibble: 7 × 2
      combination     n
      <chr>       <int>
    1 V2             72
    2 V2, V3         28
    3 V1             82
    4 V3             76
    5 V1, V3         29
    6 V1, V2         33
    7 V1, V2, V3     23
    

    Simple barplot

    library(ggplot2)
      
    ggplot(out,aes(combination,n)) + 
      geom_col()