Search code examples
rdplyrcombinationsanalysismutate

Trying to make a combination analysis but during the mutation the 1's are changed to 0's


I am trying to make a combination analysis that shows the results in a plot. I have a data frame with 9 columns and each column consists of different percentages or NA's if a value was not present in the sample.

The example code I have used for this can be found here: https://epirhandbook.com/en/combinations-analysis.html

The issue is that in a line the 1's are changed to 0's and vice versa. The line is:

data <- data %>%
  mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))

The full code that I have used is:

library(tidyverse)
library(UpSetR)
library(ggupset)

data <- META_new[c("lengthpergram","countpergram","acrylrel",
                   "cottonrel","polyestrel","polyamiderel",
                   "elastaanrel","lyocellrel","viscoserel",
                   "nylonrel","wolrel")]

columns <- c("acrylrel", "cottonrel", "polyestrel", "polyamiderel",
             "elastaanrel", "lyocellrel", "viscoserel", "nylonrel", "wolrel")

for (col in columns) {
  data[[col]][data[[col]] > 0] <- "yes"
  data[[col]][data[[col]] == 0] <- NA
}

data <- data %>%
  mutate(acrylrel = ifelse(acrylrel == "yes", 1, 0),
         cottonrel = ifelse(cottonrel == "yes", 1, 0),
         polyestrel = ifelse(polyestrel == "yes", 1, 0),
         polyamiderel = ifelse(polyamiderel == "yes", 1, 0),
         elastaanrel = ifelse(elastaanrel == "yes", 1, 0),
         lyocellrel = ifelse(lyocellrel == "yes", 1, 0),
         viscoserel = ifelse(viscoserel == "yes", 1, 0),
         nylonrel = ifelse(nylonrel == "yes", 1, 0),
         wolrel = ifelse(wolrel== "yes", 1, 0),)

data <- data %>%
  mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))

data %>%
  UpSetR::upset(
    sets = columns,
    order.by = "freq",
    sets.bar.color = c("red", "orange", "yellow", "green", "cyan", "blue", "purple", "pink", "salmon"),
    empty.intersections = "on",
    number.angles = 0,
    point.size = 2,
    line.size = 1, 
    mainbar.y.label = "Fabric combinations by frequency",
    sets.x.label = "Types of fabric present in samples")

The code gives a good plot. But it allocates the wrong column name to the value. For example, polyestrel is supposed to be the most frequent combination, but lyocellrel is allocated, even though lyocellrel is least frequent.

Unfortunately I cannot add the df, as it is too big, but I hope someone has suggestions on how to fix this (if this line is even the problem).

I changed some of the original code of the website, original:

 mutate(across(c(fever, chills, cough, aches, vomit), .fns = ~+(.x == "yes")))

Because when I tried it I got this error:

Error in start_col:end_col : argument of length 0

First 5 rows

data <- data <- data.frame(
  acrylrel = c(0.00000, 0.00000, 0.00000, 36.61972, 0.00000),
  cottonrel = c(9.089974, 65.000000, 0.000000, 19.014085, 8.500000),
  polyestrel = c(83.72237, 35.00000, 42.81081, 44.36620, 15.00000),
  polyamiderel = c(5.583548, 0.000000, 53.594595, 0.000000, 40.000000),
  elastaanrel = c(1.604113, 0.000000, 3.594595, 0.000000, 1.500000),
  lyocellrel = c(0, 0, 0, 0, 0),
  viscoserel = c(0, 0, 0, 0, 0),
  nylonrel = c(0, 0, 0, 0, 0),
  wolrel = c(0, 0, 0, 0, 0)
)

Solution

  • This appears to be what you want:

    data %>%
      mutate(across(everything(), ~ as.integer(. > 0))) %>%
      UpSetR::upset(
        sets = columns,
        order.by = "freq",
        sets.bar.color = c("red", "orange", "yellow", "green", "cyan", "blue", "purple", "pink", "salmon"),
        empty.intersections = "on",
        number.angles = 0,
        point.size = 2,
        line.size = 1, 
        mainbar.y.label = "Fabric combinations by frequency",
        sets.x.label = "Types of fabric present in samples")
    

    Output: plot

    Going through your code part by part:

    # this turns every value into "yes" if positive, or NA if 0
    for (col in columns) {
      data[[col]][data[[col]] > 0] <- "yes"
      data[[col]][data[[col]] == 0] <- NA
    }
    
    # this is the same as above, but all of the "yes" values have been turned into 1s. Note that (frustratingly!) NA == "yes" is NA, not FALSE, as you would think. The way to check for NA values is with the function is.na()
    data %>%
      mutate(acrylrel = ifelse(acrylrel == "yes", 1, 0),
             cottonrel = ifelse(cottonrel == "yes", 1, 0),
             polyestrel = ifelse(polyestrel == "yes", 1, 0),
             polyamiderel = ifelse(polyamiderel == "yes", 1, 0),
             elastaanrel = ifelse(elastaanrel == "yes", 1, 0),
             lyocellrel = ifelse(lyocellrel == "yes", 1, 0),
             viscoserel = ifelse(viscoserel == "yes", 1, 0),
             nylonrel = ifelse(nylonrel == "yes", 1, 0),
             wolrel = ifelse(wolrel== "yes", 1, 0),)
    
    # with this line, because you've already turned the "yes" values into 1s, `. %in% c("yes", NA)` evaluates to FALSE for the 1s and TRUE for the NA values (oddly this works)
    data <- data %>%
      mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))